Method of performing reverse transcription reaction using reverse transcriptase encoded by non-ltr retrotransposable element

ABSTRACT

The present invention relates to a method of preparing a cDNA molecule which includes: contacting an RNA molecule, in the presence of dNTPs, with a non-LTR retrotransposon protein or polypeptide having reverse transcriptase activity under conditions effective for production of a cDNA molecule complementary to the RNA molecule, said contacting being carried out in the absence of a target DNA molecule of the non-LTR retrotransporon protein or polypeptide; and isolating the cDNA molecule.

This application claims the benefit of U.S. Provisional PatentApplication Ser. No. 60/229,075 to Eickbush et al., filed Aug. 30, 2000,which is hereby incorporated by reference in its entirety.

This invention was made, at least in part, utilizing funding receivedfrom the National Institutes of Health grant GM42790. The U.S.government may have certain rights in this invention.

FIELD OF THE INVENTION

The present invention relates generally to the preparation of nucleicacid molecules using a protein or polypeptide having reversetranscriptase activity, particularly proteins or polypeptides which havereverse transcriptase activity and are encoded by a class of non-longterminal repeat (“non-LTR”) retrotransposable elements.

BACKGROUND OF THE INVENTION

Reverse transcriptases, enzymes that catalyze RNA-dependent DNAsynthesis, have been used as a component of transcription-basedamplification systems. These systems amplify RNA and DNA targetsequences up to 1 trillion fold. Exemplary systems are disclosed in PCTPatent Application WO 89/01050 to Burg et al.; PCT Patent Application WO88/10315 to Gingeras et al.; European Patent Application 0 329 822 toDavey and Malek, European Patent Application 0 373 960 to Gingeras etal.; PCT Patent Application WO 91/02814 to Malek and Davey; and EuropeanPatent Application 0 408 295 A2 to Kacian and Fultz. Others have alsobeen described or are otherwise conmmercially available.

Some of the transcription-based amplification methods are exceptionallyconvenient since the amplification reaction according to these methodsis isothermal. Thus, these systems are particularly suited for routineclinical laboratory use in diagnostic tests (i.e., pathogen detection,cancer detection, etc.). Reverse transcriptases are also employed as aninitial step in some protocols when the polymerase chain reaction (PCR)is used to amplify an RNA target. See U.S. Pat. No. 5,130,238 to Maleket al.; and Mocharla et al., Gene 99:271-275 (1990). In such “RT-PCR”procedures, the reverse transcriptase is used to make an initialcomplementary DNA (“cDNA”) copy of the RNA target, which is thenamplified by successive rounds of DNA replication.

Reverse transcriptases were once believed to be enzymes unique to thereplication of retroviruses (Baltimore, “RNA-dependent DNA Polymerase inVirions of RNA Tumor Viruses,” Nature 226:1209-1211 (1970); Temin andMizutani, “RNA-Directed DNA Polymerase in Virions of Rous SarcomaViruses,” Nature 226:1211 -1213 (1970)). Reverse transcriptases are nowknown to be encoded by a wide range of genetic elements in botheukaryotes and prokaryotes (Varmus, “Reverse Transcription,” Sci. Amer.257:56-66 (1987); Temin, “Retrons in Bacteria,” Nature 339:254-255(1989)).

Most commercially available reverse transcriptase, however, areretroviral in origin. The retroviral reverse transcriptases have threeenzymatic activities: a RNA-directed DNA polymerase activity, aDNA-directed DNA polymerase activity, and an RNAse H activity (Verma,“The Reverse Transcriptase,” Biochim. Biophys. Acta 473:1-38 (1977)).The latter activity specifically degrades RNA contained in an RNA:DNAduplex. Degradation of the RNA strand of RNA:DNA intermediates by RNAseH is an important component of some transcription-based amplificationsystems and is to be distinguished from unwanted degradation due tocontaminating nucleases, which interferes with amplification. Whileretroviral-derived reverse transcriptases lacking RNAse H activity havebeen developed (U.S. Pat. No. 6,063,608 to Kotewicz et al.), it shouldbe noted that retroviral transcriptases are typically characterized byseveral characteristics which limit their usefulness. These include: thenecessity to use an primer that will anneal to the RNA template, the lowprocessivity of the enzymes (i.e., the tendency to dissociate from theRNA before reaching the end), and the inability of the enzymes totranscribe through region of RNA secondary structure.

Eukaryotic genomes in particular are filled with mobile elements,retrotransposons, that use reverse transcriptase for replication. Thereverse transcriptases encoded by non-LTR retrotransposons are highlydivergent in sequence from the retroviral enzymes and utilize entirelydifferent mechanisms to prime cDNA synthesis.

One of the most abundant classes of reverse transcriptase-encodingelements is the non-LTR retrotransposons (also called LINEs, retroposonsand polyA-retrotransposons). Studies of the purified reversetranscriptase from the R2 element of the silkmoth, Bombyx mori, haveprovided insights into the mechanism of non-LTR retrotransposition (Luanet al., “Reverse Transcription of R2Bm RNA is Primed by a Nick at theChromosomal Target Site: A Mechanism for non-LTR Retrotransposition,”Cell 72:595-605 (1993)). R2 elements are specialized for insertion intothe 28S ribosomal RNA (rRNA) genes found in the nucleoli of eukaryoticcells. The 120 kilodalton protein encoded by R2 has both reversetranscriptase and endonuclease activity. Based on in vitro studies ofthese two activities, R2 retrotransposition is a coupled DNAcleavage/reverse transcription reaction (Luan and Eickbush, “RNATemplate Requirements for Target DNA-Primed Reverse Transcription by theR2 Retrotransposable Element,” Mol. Cell. Biol. 15:3882-3891 (1995);Luan and Eickbush, “Downstream 28S Gene Sequences on the RNA TemplateAffect the Choice of Primer and the Accuracy of Initiation by the R2Reverse Transcriptase,” Mol. Cell. Biol. 16:4726-4734 (1996); Mathews etal., “Secondary Structure Model of the RNA Recognized by the ReverseTranscriptase from the R2 Retrotransposable Element,” RNA 3:1-16 (1997);Yang and Eickbush, “RNA-induced Changes in the Activity of theEndonuclease Encoded by the R2 Retrotransposable Element,” Mol. Cell.Biol. 18:3455-3465 (1998); and Yang et al., “Identification of theEndonuclease Domain Encoded by R2 and Other Site-specific, non-LongTerminal Repeat Retrotransposable Elements,” Proc. Natl. Acad. Sci. USA96:7847-7852 (1999)). The 3′ ed generated by a first-stand cleavage(nick) of the DNA target site is used as primer for reversetranscription of the RNA template. This utilization of the DNA target toprime cDNA synthesis has been called target-primed reverse transcription(“TPRT”). Removal of the RNA template and synthesis of the second DNAstrand does not occur in vitro and is likely to involve the cellular DNArepair and replication machinery. While much has been learned about theTPRT reaction, the activity of any non-LTR element reverse transcriptasehas not been characterized in the absence of their DNA target site.

The present invention is directed to overcoming the above-identifiedlimitations of RT reactions performed using previously identifiedretroviral reverse transcriptases as well as other deficiencies in theart.

SUMMARY OF THE INVENTION

One aspect of the present invention relates to a method of preparing acDNA molecule which includes: contacting an RNA molecule, in thepresence of dNTPs, with a non-LTR retrotransposon protein or polypeptidehaving reverse transcriptase activity under conditions effective forproduction of a cDNA molecule complementary to the RNA molecule, saidcontacting being carried out in the absence of a target DNA molecule ofthe non-LTR retrotransposon protein or polypeptide; and isolating thecDNA molecule.

A second aspect of the present invention relates to a method ofamplifying a cDNA molecule which includes: preparing a single-strandedcDNA molecule according to the present invention, wherein thesingle-stranded cDNA molecule includes a region of interest; annealing afirst primer to the single-stranded cDNA molecule at a position 3′ ofthe region of interest; and extending the first primer to form acomplementary DNA strand including a complement of the region ofinterest.

A third aspect of the present invention relates to a kit which can beused to prepare cDNA from RNA. The kit includes: a carrier deviceincluding one or more compartments adapted to receive one or morecontainers; and a first container which includes a non-LTRretrotransposon protein or polypeptide having reverse transcriptaseactivity. The kit may further include: one or more additional containersselected from the group-consisting of (i) a second container whichincludes a buffer, (ii) a third container which includes dNTPs, (iii) afourth container which includes donor RNA having a known sequence, and(iv) a fifth container which includes acceptor RNA having a knownsequence.

A fourth aspect of the present invention relates to a pool of cDNAmolecules prepared according to the method of preparing a cDNA moleculeaccording to the present invention.

As used herein, “non-LTR retrotransposon protein or polypeptide” refersto naturally occurring proteins encoded by non-LTR retrotransposons andpolypeptide fragments thereof which possess reverse transcriptaseactivity, as well as proteins or polypeptides derived therefrom whichcontain one or more amino acid substitutions that either enhance thereverse transcriptase activity thereof or have no deleterious effectthereon. A preferred class of non-LTR retrotransposon proteins orpolypeptides are R2 proteins or polypeptides. Thus, as used herein, “R2protein or polypeptide” refers to naturally occurring proteins encodedby R2 elements and polypeptide fragments thereof which possess reversetranscriptase activity, as well as proteins or polypeptides derivedtherefrom which contain one or more amino acid substitutions that eitherenhance the reverse transcriptase activity thereof or have nodeleterious effect thereon.

Applicants have surprisingly discovered that the protein-encoded by athe R2 element of Bombyx mori, which has reverse transcriptase activity,has several unusual properties in the absence of its DNA target. It waspreviously shown that the protein encoded by the R2 element of Bombyxmori required its DNA target to carry out TPRT of its own RNA (Luan etal., “Reverse Transcription of R2Bm RNA is Primed by a Nick at theChromosomal Target Site: A Mechanism for Non-LTR Retrotransposition,”Cell 72:595-605 (1993); Luan et al., “RNA Template Requirements forTarget DNA-Primed Reverse Transcription by the R2 RetrotransposableElement,” Mol. Cell Biol. 15(7):3882-3891 (1995); Luan et al.,“Downstream 28S Gene Sequences on the RNA Template Affect the Choice ofPrimer and the Accuracy of Initiation by the R2 Reverse Transcriptase,”Mol. Cell Biol. 16(9):4726-4734 (1996), each of which is herebyincorporated by reference in its entirety). Because the R2 elementprotein can function as a reverse transcriptase in the absence of itstarget DNA, this protein (as well as polypeptide fragments thereof) canbe used to prepare cDNA in a reverse transcription procedure of thepresent invention, which can then be followed by conventionalamplification procedures to expand the copy number of the transcribedcDNAs. The present invention provides a number of benefits previouslyunrealized with reverse transcription procedures performed, for example,using retroviral or retroviral-derived proteins having reversetranscriptase activity. These include: (i) elimination of the need forsequence-specific primers, since R2 proteins or polypeptides have anability to use the 3′ end of any RNA to prime cDNA synthesis; (ii)ability to combine cDNA copies from multiple RNA templates into a singlecDNA strand, which is the result of the R2 protein or polypeptidepropensity to jump between RNA templates in the absence of any sequenceidentity; and (iii) a propensity to completely or nearly completely copythe RNA template to form a population of cDNA molecules having a greaterconcentration of substantially full-length cDNAs (as compared to thepopulation provided by retroviral or retroviral-derived proteins havingreverse transcriptase activity).

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-B illustrate a mechanism of DNA cleavage during a TPRT assayfor the R2 protein. FIG. 1A shows a diagram of the nucleic acidtemplates and products of the reactions conducted in FIG. 1B. In FIG.1A, gray lines represent RNA templates; black lines represent DNAtarget; and dotted lines represent cDNA product. The DNA substrate is auniformly ³²P-labeled 164 bp fragment. The two RNAs used as templatesare either 254 nt in length, corresponding to the 3′ untranslated regionof the R2 element from B. mori, or 274 nt in length containing an extra20 nt of the 28S gene sequence downstream of the R2 insertion site. TheTPRT reaction is initiated by a cleavage of the lower (noncoding) strandof the target DNA. The free DNA 3′ end released by this nick is used toprime reverse transcription starting at the 3′ end of the 254 nttemplate, or 20 nt from the 3′ end of the R2 sequence in the 274 nttemplate (see also diagram in FIG. 2). Thus, the TPRT product with bothRNAs is ˜254 nt cDNA, including a 110 nt fragment of lower target DNAstrand and an ˜254 nt reverse transcript of the RNA template. FIG. 1B isan image of an autoradiograph of the reaction products separated on a 33cm 6% denaturing polyacrylamide gel. The reactions contain 5 ng of theR2 protein, 20 ng of target DNA and 150 ng of RNA. Lane 1, 254 nt R2RNA; Lane 2, 274 nt R2 RNA. In addition to the ˜364 nt TPRT product,both lanes contain larger cDNA products (˜618 and 638 nt) that representjumps to the end of a second RNA template.

FIG. 2 illustrates the various junction sequences resulting from thetemplate jumps in FIG. 1B. The presumed mechanism for the generation ofthe ˜638 nt TPRT product in FIG. 1B is diagramed at the top of thisfigure. Reverse transcription is initiated 20 nt from the 3′ end of theRNA at the beginning of R2 3′ UTR. When the RT reaches the 5′ end of thefirst template (donor) it jumps to the free 3′ end of a second RNAtemplate. To obtain the cDNA sequence corresponding to these jumps, the˜638 nt TPRT product from FIG. 1B, lane 2, was purified from the gel andthe junction region amplified by PCR using primers AB.18 (SEQ ID No: 31,Table I) and AB.19 (SEQ ID No: 32, Table I). At the bottom of the Figureare the junctions derived from six cloned PCR products. The top sequencerepresents the 5′ and 3′ ends of 274 nt R2 RNA. Four of the junctionscontain an extra nucleotide (nucleotides between the inner dottedvertical lines) while one junction contains a 6 nt internal deletion ofthe acceptor RNA. SEQ ID Nos: 1 and 2 are, respectively, the 3′-terminaland 5′-terminal sequences of the 274 base R2 RNA. SEQ ID Nos: 3-6 arethe nucleotide sequences of the junction, region of cDNA RT products.

FIG. 3 is a graph illustrating the effects of RNA concentration on theefficiency of the template jumping reaction. TPRT reactions wereperformed similar to that in FIG. 1 with the 254 nt R2 RNA templateconcentration varied from 0.4 -40 nM. Products of TPRT reactions wereseparated on a 6% denaturing polyacrylamide gel and the intensity of the˜618 nt fragment was determined relative to the total level of TRPT(˜364 and ˜618 nt bands) using a PhosphorImager and Image Quant.

FIG. 4 is an image of an autoradiograph which illustrates that templatejumps during the TPRT reaction are specific to R2 RNA templates. Thereactions were conducted as in FIG. 1 except that the RNA templatesadded to the reactions were: lane 1, 50 ng of the 274 nt R2 RNA (20 nM);lane 2, 20 nM 274 nt R2 RNA and 80 nM 334 nt vector RNA; lane 3, 20 nM274 nt R2 RNA and 160 nM 334 nt vector RNA. The competing vector RNAwould give rise to TPRT products of about 445 nt. Template jumps of themajor TPRT product to vector RNA would give rise to a ˜700 nt band.

FIGS. 5A-B illustrate how reverse transcription can be primed by RNAitself. FIG. 5A is an autoradiograph illustrating the reversetranscription of the 334 nt vector RNA in the absence of the DNA targetsite. All reactions were conducted with 300 ng of the 334 nt vector RNAin the presence of [α-³²p] dCTP. Lane 1, primer extension reaction inwhich 50 ng primer AB.23 (SEQ ID No: 33, Table I) was pre-annealed tothe RNA template. The reverse transcription products were treated with 2μg RNase A for 10 min at 37° C. before electrophoresis. Lane 2, reactionidentical to that in lane 1, but no primer was annealed to the RNAtemplate. Lane 3, reaction conditions identical to that in lane 2 (noDNA primer), but the reaction products were not treated with RNase Abefore electrophoresis. FIG. 5B is a schematic diagram of the reversetranscription reactions shown in FIG. 5A. The top diagram shows a simpleprimer extension assay (lane 1) giving rise to a 334 nt cDNA primaryproduct followed by a template jump to generate a ˜668 nt product. Thebottom diagram shows a reverse transcription reaction that is primed byanother RNA molecule. The products of this RNA-primed reaction will be˜334 nt and ˜668 nt if the products are treated with RNase A beforeelectrophoresis (lane 2), and ˜668 nt and ˜1000 nt if not treated withRNase A (lane 3). The self complimentary covalent RNA/cDNA hybridmigrates as a diffuse band at ˜600 nt in lane3, because secondarystructures form that affect migration. RNA migrates with a differentmobility than single-stranded DNA on these gels. The cDNA products inlane 2 are about 10 nt shorter than those in lane 1 suggesting thatRNA-primed cDNA synthesis initiates about 10 nt from the 3′ end of thetemplate.

FIGS. 6A-B illustrate the specificity of the RNA-primed reversetranscription and template jumping reactions. FIG. 6A is anautoradiograph of the reaction products obtained for the RNA-primedreverse transcription and template jumping reactions. All reactions wereconducted in the presence of [α-³²P] dCTP, the absence of DNA primers,and all products were treated with RNase A before electrophoresis. Lane1, 2.4 pmoles 183 nt vector RNA; lane 2, 2.4 pmoles 254 nt R2 RNA, lane3, 2.4 pmoles 334 nt vector RNA; lane 4, 2.4 pmoles 183 nt vectorRNA+2.4 pmoles 254 nt R2 RNA; and Lane 5, 2.4 pmoles 254 nt R2 RNA+2.4pmoles 334 nt vector RNA. The efficiency of the RNA-priming and of thetemplate jumps were quantified and are presented in Table 2 (see Example5). FIG. 6B illustrates the junction sequences of template jumps fromthe 334 nt vector RNA to the 254 nt R2 RNA. An aliquot of the totalreaction products shown in lane 5 above was PCR amplified using primersAB.23 (SEQ ID No: 33, Table I) and AB.2b (SEQ ID No: 26, Table I). ThePCR products were cloned and random clones sequenced. Shown at the topare the 3′ end of the 254 nt RNA template (SEQ ID No: 7) and the 5′ endof the 334 nt RNA template (SEQ ID No: 8) used in the reversetranscription reaction. Below these sequences are seven clones derivedfrom the cDNA (SEQ ID Nos: 9-15). Most of the junctions containadditional nucleotides not derived from either of the RNA templates(nucleotides between the dotted vertical lines). Not all cDNA productsextended to the end of the 334 nt templates (number of bases deleted aregiven), but these jumps did not involve short segments of sequenceidentity with the acceptor RNA.

FIGS. 7A-B illustrate the affects of directed template jumps between‘donor’ and ‘acceptor’ RNA templates. In FIG. 7B, an image of anautoradiograph illustrates directed template jumps to short ‘acceptor’RNA templates. Reverse transcription in each reaction was initiated froma ³²P-end-labeled primer AB.9 (SEQ ID No: 28, Table I) (30 ng) annealedto 30 ng 177 nt ‘donor’ RNA template. Lane 1 contained 400 ng 334 ntvector RNA, lane 2 contained 400 ng 183 nt vector RNA, and lane 3contained no acceptor RNA. In FIG. 7A, an image of an autoradiographillustrates directed template jumps to longer ‘acceptor’ RNA templates.Reverse transcription was again initiated from the ³²P-end-labeledprimer AB.9 (SEQ ID No: 28, Table I) (50 ng) annealed to 50 ng 177 nt‘donor’ RNA template. Lane 1, 300 ng 334 nt vector RNA; lane 2, 300 ng600 nt RNA; and lane 3, 300 ng 1090 nt RNA.

FIGS. 8A-B illustrate the affect that the DNA target has to stabilizeinteractions between the R2 protein and its RNA template. All componentswere pre-incubated for 15 minutes at 37° C. and separated on 5% nativepolyacrylamide gels at 4° C. In FIG. 8A, all lanes contain 10 ng of³²P-labeled 254 nt R2 RNA and 10 ng of R2 protein. Lane 1, no otheradditions; lane 2, 100 ng of the 164 bp target DNA; lane 3, 100 ng of a50 nt DNA oligonucleotide AB.17 (SEQ ID No: 30, Table I); lane 4, 100 ngof DdeI digested pBSII(SK−) DNA. In FIG. 8B, lane 1, 20 ng ³²P-labeledtarget DNA, 15 ng R2 protein, and 100 ng 254 nt R2 RNA; lane 2, 20 ng³²P-labeled target DNA and 15 ng R2 protein; lane 3, 10 ng of³²P-labeled 254 nt R2 RNA. 10 ng of R2 protein and 100 ng of the 164 bptarget DNA.

FIG. 9A is an autoradiograph which illustrates that template jumps canoccur onto single-stranded DNA. Each reverse transcription reactioncontained 150 ng 254 nt R2 RNA pre-annealed with various amounts of theDNA primer AB.2b (19 nt) (SEQ ID No: 26, Table I). Lane 1, 500 ng primerAB.2b (SEQ ID No: 26, Table I); lane 2, 50 ng primer AB.2b (SEQ ID No:26, Table I); lane 3, 5 ng primer AB.2b (SEQ ID No: 26, Table I); lane4, 50 ng primer AB.2b (SEQ ID No: 26, Table I)+250 ng ssDNA AB.17 (SEQID No: 30, Table I) (50 nt); lane 5, 50 ng primer AB.2b (SEQ ID No: 26,Table I)+250 ng ssDNA AB.26 (SEQ ID No: 35, Table I) (54nt). FIG. 9Billustrates the junction sequences of template jumps from the 254 nt R2RNA to the 54 nt ssDNA. The ˜310 nt band in lane 5 (panel A) was excisedfrom the gel, the DNA eluted and PCR amplified using primers AB.23 (SEQID No: 33, Table I) and AB.2b (SEQ ID No: 26, Table I). The PCR productswere cloned and random clones sequenced. Shown at the top is the 3′ endof the 54 nt ssDNA acceptor (SEQ ID No: 16) and the 5′ end of the 254 ntR2 RNA donor (SEQ ID No: 17). Below these sequences are the fivesequenced junctions, two of which were identical (SEQ ID No: 18). Threeof the sequences (SEQ ID Nos: 19-21) possessed junctions that containadditional nucleotides not derived from either of the donor RNA oracceptor ssDNA (nucleotides between the dotted vertical lines).

FIG. 10 is an image of an autoradiograph which illustrates a comparisonof the template jumping activity of the R2 and AMV RTs. The reactionscontained 150 ng of a 283 nt R2 RNA pre-annealed with 250 ng DNA primerAB.2b (SEQ ID. No: 26, Table I). Reactions were conducted in theidentical conditions except that lane 1 contained 5 U of AMV RT(Promega) and lane 2 contained ˜10 ng of R2 protein.

FIGS. 11A-C are schematic models which correlate the unusual abilitiesof the R2 RT to its structural differences from that of retroviral RT.Gray lines represent RNA; dotted lines represent cDNA; black linesrepresent DNA; and rounded rectangles represent protein. The active siteof the RT is indicated by a diffuse shaded region. FIG. 11A is acomparison of the structure of the R2 and HIV RTs. The HIV structure isa simplified depiction of the detailed crystallographic studies(Kohlstaedt et al., “Crystal Structure at 3.5 Angstrom Resolution ofHIV-1 Reverse Transcriptase Complexed with an Inhibitor,” Science256:1783-1790 (1992); Sarafianos et al., “Crystal Structure of HIV-1Reverse Transcriptase in Complex with a Polypurine Tract RNA:DNA,” EMBOJ. 20:1449-1461 (2001), each of which is hereby incorporated byreference in its entirety). The R2 protein lacks an RNase H domain andhas additional segments in the ‘fingers and palm’ regions of the RTdomain. Therefore, unlike the retroviral protein, the R-2 protein isdepicted as containing most of its affinity for the RNA templateupstream of the active site (shaded region). FIG. 11B is summary of theunusual properties of the R2 protein. Because the 3′ end of free RNA canbind downstream of the active site, it can be used to prime reversetranscription. Template jumping is possible because the template bindingsite upstream of the active site can bind a second RNA before theprotein dissociates after reverse transcribing the first RNA template.FIG. 11C illustrates the similarity of RNA-priming and template jumpingto models of the integration reaction of R2. RNA-priming can be viewedas similar to the signature step of the TPRT reaction. When the R2protein is bound to the DNA target site, the 3′ end of the cleaved DNAstrand can bind downstream of the RT active site. Meanwhile, templatejumps may be similar to one mechanism proposed for the attachment of theR2 sequence to the upstream target site after second strand cleavage(Burke et al., “The Domain Structure and Retrotransposition Mechanism ofR2 Elements are Conserved Throughout Arthropods,” Mol. Biol. Evol;16:502-511 (1999), which is hereby incorporated by reference in itsentirety). While there is no direct biochemical evidence for this step,it represents the model that best explains the sequence variation foundat the 5′ end of endogenous R2 elements (Burke et al., “The DomainStructure and Retrotransposition Mechanism of R2 Elements are ConservedThroughout Arthropods,” Mol. Biol. Evol. 16:502-511 (1999); and Georgeet al., “Analysis of the 5′ Junctions of R2 Insertions with the 28Sgene: Implications for non-LTR Retrotransposition,” Genetics 142:853-863(1996), each of which is hereby incorporated by reference in itsentirety) For both TPRT and 5′ attachment, the DNA strands are drawnpartially denatured by the R2 protein, as it would seem most similar towhat has been shown to occur with RNA.

FIG. 12 is a diagram which illustrates a method of preparing cDNAaccording to the present invention, which may include RNA-priming andtemplate jumping steps. Gray lines represent RNA templates; and dottedlines represent cDNA product. The initial components of the reaction areRNA (either template RNA, donor RNA, or acceptor RNA), the non-LTRretrotransposon RT, and dNTPs. The R2 RT can use the 3′ end of one RNAmolecule to prime reverse transcription of a second RNA molecule (RNApriming). The two RNA molecules can be the same or different. Afterreverse transcription to the end of the second RNA, R2 RT can jump to athird RNA molecule (again, the same or different) and continue reversetranscription (template jumping). Both RNA-priming and template jumpingdo not require sequence identity between the RNAs involved.

FIG. 13 is an autoradiograph which illustrates a processivity assaycomparing R2 and AMV RTs on a 600 nt RNA template. 5′ end-labeled AB.23primer (SEQ ID No: 33, Table I) was annealed to the 600 nt vector RNAtemplate as described above. Each lane contained 50 fmole of annealedRNA/DNA-primer. Reactions were started with the addition of 5 μl of 1.25mM dNTP and then stopped after 5 min incubation in 37° C. In thosereactions with R2 RT, 2 ng (20 fmole) of R2 RT was preincubated with theRNA/primer for 5 min at 37° C. in 50 mM Tris-HCl (pH 7.5), 0.2 M NaCl,10 mM MgCl₂, 2.5 mM DTT, 0.01 % Triton X-100 in final volume 25 μl. Lane1, no other additions; lane 2, the preincubation mixture also contained2.5 μl of “trap” (20 μg of heparin, ˜1 μg of poly(rA)/poly(dT)13-18);lane 3, after preincubation the “trap” was added at the start of thereaction (addition of dNTP). In those reactions with AMV RT, 2.5 U ofAMV RT (Promega) was preincubated with the RNA/primer for 5 min at 37°C. in 50 mM Tris-HCl (pH 8.3), 50 mM KCl, 5 mM MgCl₂, 5 mM DTT, 0.5 mMspermidine, in a final volume of 25 μl. Lane 1, no other additions; lane2, the preincubation mixture also contained 2.5 μl of “trap” (20 μg ofheparin, ˜1 μg of poly(rA)/poly(dT) 13-18); lane 3, after preincubationthe “trap” was added at the start of the reaction (addition of dNTP).

FIGS. 14A-B are scans of the RNA processivity assay using a 600 nt (14A)and 1094 nt (14B) RNA templates. FIG. 14A is the Phosphoimage scan oflane 3 from FIG. 13. The approximate size of the cDNA products can becalculated relative to 100 nt size standards. FIG. 14B is the similarprocessivity assay to that conducted in FIG. 13 lane 3 (‘trap’reactions), except that the RNA is a 1094 nt vector RNA and the reversetranscription reaction was primed with the end-labeled AB.34 (SEQ ID No:38, Table I). For each primer extension reaction, 50 fmole of annealedRNA/DNA-primer was used under the same conditions as described for lane3 in FIG. 13. The products of reverse transcription was separated on 6%denaturing PAGE, scanned using PhosphoImaging function and analyzedusing Image Quant.

FIGS. 15A-B illustrate the dissociation rates of R2 RT and AMV RT froman RNA template. FIG. 15A is an autoradiograph of different primerextension reactions, which were allowed to proceed for varying lengthsof time. End-labeled AB.8 (SEQ ID No: 27, Table I) DNA primer wasannealed to an 183 nt RNA template. For each primer extension reaction,about 50 fmole of annealed RNA/DNA-primer was used. The complexdissociation, as a function of time, was assayed by yield of primerextension. For R2 RT reaction, 50 fmole of the template/primer waspreincubated with 2 ng (20 fmole) of R2 RT for 15 min at 37° C. underthe same conditions as in FIG. 13. After preincubation 2.5 μl of the“trap” (20 μg of heparin, ˜1 μg of poly(rA)/poly(dT)13-18) was added tothe preincubation. The addition of the “trap” is considered as a time 0.The mixture was incubated at 37° C. for the lengths of time indicatedbefore the addition of 2.5 μl of 2.5 mM dNTP to start the reaction. Allpolymerization reactions were conducted for 4 min at 37° C. The productsof the reactions were separated on 7% of denaturing polyacrylamide geland analyzed as described above using PhosphorImager and Image Quant.The fraction of enzyme which remained bound to the template as afunction of time was determined based on assumption that the yield ofthe cDNA accumulation is proportional to the fraction of enzymes that isbind to the template at the moment of addition of dNTP. Reactions withthe AMV RT were conducted like that with the R2 RT, except that eachreaction contained 2.5 U of AMV RT (Promega) and the preincubationmixture was that recommended for AMV (see FIG. 13). FIG. 15B is a graphcomparing the R2 and AMV RT dissociation rates from an RNA template.Open squares, data for R2 RT (average of three experiments); solidcircles, data for AMV RT.

FIGS. 16A-B illustrate the elongation rate of R2 RT. FIG. 16A is anautoradiograph illustrating the reaction products of a primed RTreaction. The template in the reaction of FIG. 16A is the 1094 nt RNA,with synthesis primed by end-labeled AB.34 (SEQ ID No: 38, Table I). Foreach lane, 125 fmole of template/primer was preincubated with 10 ng (100fmole) of R2 RT for 5 min at 37° C. After preincubation, the reversetranscription was started by addition 5 μl of 1.25 nM dNTP and stoppedaccording to the time in seconds indicated at the top of the Figure byquick mixing with 3 volume of ethanol containing 0.3M sodium acetate (pH5.2) and 1% SDS. After precipitation, the products were separated on 6%denaturing PAGE and analyzed using PhosphorImager and Image Quant. FIG.16B illustrates a similar elongation assay using the 600 nt RNA primedwith AB.23 (SEQ ID No: 33, Table I) and either the R2 and AMV RT.Plotted in this Figure is the longest polymerization products detectedon denaturing polyacrylamide gels like that in FIG. 16A. The maximalrate of elongation was determined by a fitting of the data points with alinear function. Circles, the R2 data points; triangles, the AMV datapoints.

FIG. 17A is an autoradiograph which illustrates how R2 RT is unaffectedby RNA secondary structure. The RNA template for this reaction has anannealed primer near the middle of the RNA and a longer RNA blockannealed to the 5′ end of the RNA (FIG. 17B). This template was formedby the annealing of 300 ng (3 pmol) of 334 nt RNA, 25 ng (4 pmole) ofend-labeled DNA primer AB.8 (SEQ ID No: 27, Table I), and 300 ng (8pmol) of 117 nt RNA. The DNA template for the production of the 117 ntRNA was pBSK(SK−) digested with KpnI. RNA was synthesized with T3 RNApolymerase. The procedure of annealing was similar to that of all otherannealing reaction described above. The 334 nt RNA template without theRNA block was prepared in a similar manner. For each primer extensionreaction, about 300 fmole of annealed template preincubated with either2 ng (20 fmole) of R2 or 1.5 U of AMV RT (Promega). Preincubations withenzymes were for 5 min. at 37° C. After preincubation, reversetranscription was started by addition 5 μl of 1.25 mM dNTP, and stoppedafter 5 min at 37° C. In the processive runs (left panel) 2.5 μl of“trap” (20 μg of heparin, ˜1 μg of poly(A)/poly(dT)13-18) was addedalong with 2.5 μl of 2.5 mM dNTP to start the reaction.

FIG. 18 is an autoradiograph which illustrates the effects oftemperature on RT processivity. The template and reaction condition areexactly like those described with respect to FIG. 13 for a processiverun (i.e., with the trap), except that preincubation was for 20 min at25° C. followed by a short (2 min) equilibration at the new temperature(25° C.-55° C. in 5° C increments). The elongation reaction was startedby the addition of dNTPs and the trap and were conducted for 5 minutes.

FIGS. 19 is a graph which illustrates the percentage of the cDNAproducts that are full-length (600 nt) as a function of temperature. Thegraph is an analysis of the data in FIG. 18. Plotted is the fraction ofthe total cDNA (all cDNA between 100 and 600 nt in FIG. 18) thatcorresponds to full length (600 nt). The left panel is for R2.RT and theright panel is for AMV RT.

DETAILED DESCRIPTION OF THE INVENTION

Preferred non-LTR retrotransposon proteins or polypeptides are theproteins or polypeptides of R2 elements.

One preferred protein possessing reverse transcriptase activity andencoded by a non-LTR retrotransposable R2 element is the protein encodedby the R2 element of Bombyx mori. This protein has an amino acidsequence corresponding to SEQ ID No: 22 as follows: Met Met Ala Ser ThrAla Leu Ser Leu Met Gly Arg                   5                  10 CysAsn Pro Asp          15 Gly Cys Thr Arg Gly Lys His Val Thr Ala Ala Pro             20                  25 Met Asp Gly Pro      30 Arg Gly ProSer Ser Leu Ala Gly Thr Phe Gly Trp          35                  40 GlyLeu Ala Ile  45 Pro Ala Gly Glu Pro Cys Gly Arg Val Cys Ser Pro     50                  55                  60 Ala Thr Val Gly Phe PhePro Val Ala Lys Lys Ser Asn Lys Glu Asn 65                  70                  75 Arg Pro Glu Ala             80 Ser Gly Leu Pro Leu Glu Ser Glu Arg Thr Gly Asp                 85                  90 Asn Pro Thr Val          95 ArgGly Ser Ala Gly Ala Asp Pro Val Gly Gln Asp            100                 105 Ala Pro Gly Trp     110 Thr Cys GlnPhe Cys Glu Arg Thr Phe Ser Thr Asn         115                 120 ArgGly Leu Gly 125 Val His Lys Arg Arg Ala His Pro Val Glu Thr Asn    130                 135                 140 Thr Asp Ala Ala Pro MetMet Val Lys Arg Arg Trp His Gly Glu Glu145                 150                 155 Ile Asp Leu Leu            160 Ala Arg Thr Glu Ala Arg Leu Leu Ala Glu Arg Gly                165                 170 Gln Cys Ser Gly         175 GlyAsp Leu Phe Gly Ala Leu Pro Gly Phe Gly Arg            180                 185 Thr Leu Glu Ala     190 Ile Lys GlyGln Arg Arg Arg Glu Pro Tyr Arg Ala         195                 200 LeuVal Gln Ala 205 His Leu Ala Arg Phe Gly Ser Gln Pro Gly Pro Ser    210                 215                 220 Ser Gly Gly Cys Ser AlaGlu Pro Asp Phe Arg Arg Ala Ser Gly Ala225                 230                 235 Glu Glu Ala Gly            240 Glu Glu Arg Cys Ala Glu Asp Ala Ala Ala Tyr Asp                245                 250 Pro Ser Ala Val         255 GlyGln Met Ser Pro Asp Ala Ala Arg Val Leu Ser            260                 265 Glu Leu Leu Glu     270 Gly Ala GlyArg Arg Arg Ala Cys Arg Ala Met Arg         275                 280 ProLys Thr Ala 285 Gly Arg Arg Asn Asp Leu His Asp Asp Arg Thr Ala    290                 295                 300 Ser Ala His Lys Thr SerArg Gln Lys Arg Arg Ala Glu Tyr Ala Arg305                 310                 315 Val Gln Glu Leu            320 Tyr Lys Lys Cys Arg Ser Arg Ala Ala Ala Gln Val                325                 330 Ile Asp Gly Ala         335 CysGly Gly Val Gly His Ser Leu Glu Glu Met Glu            340                 345 Thr Tyr Trp Arg     350 Pro Ile LeuGlu Arg Val Ser Asp Ala Pro Gly Pro         355                 360 ThrPro Glu Ala 365 Leu His Ala Leu Gly Arg Ala Glu Trp His Gly Gly    370                 375                 380 Asn Arg Asp Tyr Thr GlnLeu Trp Lys Pro Ile Ser Val Glu Glu Ile385                 390                 395 Lys Ala Ser Arg            400 Phe Asp Trp Arg Thr Ser Pro Gly Pro Asp Gly Ile                405                 410 Arg Ser Gly Gln         415 TrpArg Ala Val Pro Val His Leu Lys Ala Glu Met            420                 425 Phe Asn Ala Trp     430 Met Ala ArgGly Glu Ile Pro Glu Ile Leu Arg Gln         435                 440 CysArg Thr Val 445 Phe Val Pro Lys Val Glu Arg Pro Gly Gly Pro Gly    450                 455                 460 Glu Tyr Arg Pro Ile SerIle Ala Ser Ile Pro Leu Arg His Phe His465                 470                 475 Ser Ile Leu Ala            480 Arg Arg Leu Leu Ala Cys Cys Pro Pro Asp Ala Arg                485                 490 Gln Arg Gly Phe         495 IleCys Ala Asp Gly Thr Leu Glu Asn Ser Ala Val            500                 505 Leu Asp Ala Val     510 Leu Gly AspSer Arg Lys Lys Leu Arg Glu Cys His         515                 520 ValAla Val Leu 525 Asp Phe Ala Lys Ala Phe Asp Thr Val Ser His Glu    530                 535                 540 Ala Leu Val Glu Leu LeuArg Leu Arg Gly Met Pro Glu Gln Phe Cys545                 550                 555 Gly Tyr Ile Ala            560 His Leu Tyr Asp Thr Ala Ser Thr Thr Leu Ala Val                565                 570 Asn Asn Glu Met         575 SerSer Pro Val Lys Val Gly Arg Gly Val Arg Gln            580                 585 Gly Asp Pro Leu     590 Ser Pro IleLeu Phe Asn Val Val Met Asp Leu Ile         595                 600 LeuAla Ser Leu 605 Pro Glu Arg Val Gly Tyr Arg Leu Glu Met Glu Leu    610                 615                 620 Val Ser Ala Leu Ala TyrAla Asp Asp Leu Val Leu Leu Ala Gly Ser625                 630                 635 Lys Val Gly Met 640 Gln GluSer Ile Ser Ala Val Asp Cys Val Gly Arg                645                 650 Gln Met Gly Leu         655 ArgLeu Asn Cys Arg Lys Ser Ala Val Leu Ser Met            660                 665 Ile Pro Asp Gly     670 His Arg LysLys His His Tyr Leu Thr Glu Arg Thr         675                 680 PheAsn Ile Gly 685 Gly Lys Pro Leu Arg Gln Val Ser Cys Val Glu Arg    690                 695                 700 Trp Arg Tyr Leu Gly ValAsp Phe Glu Ala Ser Gly Cys Val Thr Leu705                 710                 715 Glu His Ser Ile            720 Ser Ser Ala Leu Asn Asn Ile Ser Arg Ala Pro Leu                725                 730 Lys Pro Gln Gln         735 ArgLeu Glu Ile Leu Arg Ala His Leu Ile Pro Arg            740                 745 Phe Gln His Gly     750 Phe Val LeuGly Asn Ile Ser Asp Asp Arg Leu Arg         755                 760 MetLeu Asp Val 765 Gln Ile Arg Lys Ala Val Gly Gln Trp Leu Arg Leu    770                 775                 780 Pro Ala Asp Val Pro LysAla Tyr Tyr His Ala Ala Val Gln Asp Gly785                 790                 795 Gly Leu Ala Ile            800 Pro Ser Val Arg Ala Thr Ile Pro Asp Leu Ile Val                805                 810 Arg Arg Phe Gly         815 GlyLeu Asp Ser Ser Pro Trp Ser Val Ala Arg Ala            820                 825 Ala Ala Lys Ser     830 Asp Lys IleArg Lys Lys Leu Arg Trp Ala Trp Lys         835                 840 GlnLeu Arg Arg 845 Phe Ser Arg Val Asp Ser Thr Thr Gln Arg Pro Ser    850                 855                 860 Val Arg Leu Phe Trp ArgGlu His Leu His Ala Ser Val Asp Gly Arg865                 870                 875 Glu Leu Arg Glu            880 Ser Thr Arg Thr Pro Thr Ser Thr Lys Trp Ile Arg                885                 890 Glu Arg Cys Ala         895 GlnIle Thr Gly Arg Asp Phe Val Gln Phe Val His            900                 905 Thr His Ile Asn     910 Ala Leu ProSer Arg Ile Arg Gly Ser Arg Gly Arg         915                 920 ArgGly Gly Gly 925 Glu Ser Ser Leu Thr Cys Arg Ala Gly Cys Lys Val    930                 935                 940 Arg Glu Thr Thr Ala HisIle Leu Gln Gln Cys His Arg Thr His Gly945                 950                 955 Gly Arg Ile Leu            960 Arg His Asn Lys Ile Val Ser Phe Val Ala Lys Ala                965                 970 Met Glu Glu Asn         975 LysTrp Thr Val Glu Leu Glu Pro Arg Leu Arg Thr            980                 985 Ser Val Gly Leu     990 Arg Lys ProAsp Ile Ile Ala Ser Arg Asp Gly Val         995                1000 GlyVal Ile Val 1005 Asp Val Gln Val Val Ser Gly Gln Arg Ser Leu Asp   1010                1015                1020 Glu Leu His Arg Glu LysArg Asn Lys Tyr Gly Asn His Gly Glu Leu1025               1030                1035 Val Glu Leu Val           1040 Ala Gly Arg Leu Gly Leu Pro Lys Ala Glu Cys Val               1045                1050 Arg Ala Thr Ser        1055 CysThr Ile Ser Trp Arg Gly Val Trp Ser Leu Thr           1060                1065 Ser Tyr Lys Glu    1070 Leu Arg SerIle Ile Gly Leu Arg Glu Pro Thr Leu        1075                1080 GlnIle Val Pro 1085 Ile Leu Ala Leu Arg Gly Ser His Met Asn Trp Thr   1090                1095                1100 Arg Phe Asn Gln Met ThrSer Val Met Gly Gly Gly Val Gly 1105               1110

This protein is further characterized as also possessing endonucleaseactivity. It is encoded by a DNA molecule having a nucleotide sequencecorresponding to SEQ ID No: 23 as follows. atgatggcga gcaccgcactgtcccttatg ggacggtgta 60 acccggatgg ctgtacacgt ggtaaacacg tgacagcagccccgatggac ggaccgcgag 120 gaccgtcaag cctagcaggt accttcgggt ggggccttgcgatacctgcg ggcgaaccct 180 gtggtcgggt ttgcagcccg gccacagtgg gtttttttcctgttgcaaaa aagtcaaata 240 aagaaaatag acctgaagcc tctggcctcc cgctggagtcagagaggaca ggcgataacc 300 cgactgtgcg gggttccgcc ggcgcagatc ctgtgggtcaggatgcgcct ggttggacct 360 gccagttctg cgaacgaacc ttttcgacca acaggggtttgggtgtccac aagcgtagag 420 cccaccctgt tgagaccaat acggatgccg ctccgatgatggtgaagcgg cggtggcatg 480 gcgaggaaat cgacctcctc gctcgcaccg aggccaggttgctcgctgag cggggtcagt 540 gctcgggtgg agacctcttt ggcgcgcttc cagggtttggaagaactctg gaagcgatta 600 agggacaacg gcggagggag ccttatcggg cattggtgcaagcgcacctt gcccgatttg 660 gttcccagcc gggtccctcg tcgggggggt gctcggccgagcctgacttc cggcgggctt 720 ctggagctga ggaagcgggc gaggaacgat gcgccgaagacgccgctgcc tatgatccat 780 ccgcagtcgg tcagatgtcg cccgatgccg ctcgggttctctccgaactc cttgagggtg 840 cggggagaag acgagcgtgc agggctatga gacccaagactgcagggcgg cgaaacgatt 900 tgcacgatga tcggacagct agtgcccaca aaaccagtagacaaaagcgc agggcagagt 960 acgcgcgtgt gcaggaactg tacaagaagt gtcgcagcagagcagcagct gaggtgatcg 1020 atggcgcgtg tgggggtgtc ggacactcgc tcgaggagatggagacctat tggcgaccta 1080 tcctcgagag agtgtccgat gcacctgggc ctacaccggaagctcttcac gccctagggc 1140 gtgcggagtg gcacgggggc aatcgcgact acacccagctgtggaagccg atctcggtgg 1200 aagagatcaa ggcctcccgc tttgactggc gaacttcgccgggcccggac ggtatacgtt 1260 cgggtcagtg gcgtgcggtt cctgtgcact tgaaggcggaaatgttcaat gcatggatgg 1320 cacgaggcga aatacccgaa attctacggc agtgccgaaccgtctttgta cctaaggtgg 1380 agagaccagg tggaccgggg gaatatcgac cgatctcgatcgcgtcgatt cccctgagac 1440 actttcactc catcttggcc cggaggctgt tggcttgctgcccccctgat gcacgacagc 1500 gcggatttat ctgcgccgac ggtacgctgg agaattccgcagtactggac gcggtgcttg 1560 gggatagcag gaagaagctg cgggaatgtc acgtggcggtgctagacttc gccaaggcat 1620 ttgacacagt gtctcacgag gcacttgtcg aattgctgaggttgaggggc atgcccgaac 1680 agttctgcgg ctacattgct cacctatacg atacggcgtccaccacctta gccgtgaaca 1740 atgaaatgag cagccctgta aaagtgggac gaggggttcgtcaaggggac cctctgtcgc 1800 cgatactctt caacgtggtg atggacctca tcctggcttccctgccggag agggtcgggt 1860 ataggttgga gatggaactc gtgtccgctc tggcctatgctgacgaccta gtcctgcttg 1920 cggggtcgaa ggtagggatg caggagtcca tctctgctgtggactgtgtc ggtaggcaga 1980 tgggcctacg cctgaattgc aggaaaagcg cggttctgtctatgataccg gatggccacc 2040 gcaagaagca tcactacctg actgagcgaa ccttcaatattggaggtaag ccgctcaggc 2100 aggtgagttg tgttgagcgg tggcgatatc ttggtgtcgattttgaggcc tctggatgcg 2160 tgacattaga gcatagtatc agtagtgctc tgaataacatctcaagggca cctctcaaac 2220 cccaacagag gttggagatt ttgagagctc atctgattccgagattccag cacggttttg 2280 tgcttggaaa catctcggat gaccgattga gaatgctcgatgtccaaatc cggaaagcag 2340 tcggacagtg gctaaggcta ccggcggatg tgcccaaggcatattatcac gccgcagttc 2400 aggacggcgg cttagcgatc ccatcggtgc gagcgaccatcccggacctc attgtgaggc 2460 gtttcggggg gctcgactcg tcaccatggt cagtggcaagagccgccgcc aaatctgata 2520 agattcgtaa gaaactgcgg tgggcctgga aacagctccgcaggttcagc cgtgttgact 2580 ccacaacgca acgaccatct gtgcgcttgt tttggcgagaacatctgcat gcatctgttg 2640 atggacgcga acttcgcgaa tccacacgca ccccgacatccacaaagtgg attagggagc 2700 gatgcgcgca gataaccgga cgggacttcg tgcagttcgtgcacactcat atcaacgccc 2760 tcccatcccg cattcgcgga tcgagagggc gtagaggtgggggtgagtct tcgttgacct 2820 gccgtgctgg ttgcaaggtt agggagacga cggctcacatcctacaacag tgtcacagaa 2880 cacacggcgg ccggattcta cgacacaaca agattgtatctttcgtggcg aaagccatgg 2940 aagagaacaa gtggacggtt gagctggagc cgaggctacgaacatcggtt ggtctccgta 3000 agccggatat tatcgcctcc agggatggtg tcggagtgatcgtggacgtg caggtggtct 3060 cgggccagcg atcgcttgac gagctccacc gtgagaaacgtaataaatac gggaatcacg 3120 gggagctggt tgagttggtc gcaggtagac taggacttccgaaagctgag tgcgtgcgag 3180 ccacttcgtg cacgatatct tggaggggag tatggagcctgacttcttat aaggagttaa 3240 ggtccataat cgggcttcgg gaaccgacac tacaaatcgttccgatactg gcgttgagag 3300 gttcacacat gaactggacc aggttcaatc agatgacgtccgtcatgggg ggcggcgttg 3345 gttgaThe complete amino acid and cDNA nucleotide sequences are also reported,respectively, at Genbank Accession Nos. AAB59214 and MI6558, each ofwhich is hereby incorporated by reference in its entirety.

In addition to the protein encoded by the R2 element of Bombyx mori,other proteins possessing reverse transcriptase activity which areencoded by different non-LTR retrotransposable R2 elements can also beemployed in the methods of the present invention. A number of otherarthropods are known to harbor R2 elements which exhibit a similarstructure to the R2 element of Bombyx mori (Burke et al., “The DomainStructure and Retrotransposition Mechanism of R2 Elements Are ConservedThroughout Arthropods,” Mol. Biol. Evol. 16(4):502-511 (1999); Yang etal., “Identification of the Endonuclease Domain Encoded by R2 and OtherSite-Specific, Non-Long Terminal Repeat Retrotransposable Elements,”Proc. Natl. Acad. Sci. USA 96:7847-7852 (1999), each of which is herebyincorporated by reference in its entirety). The R2 elements of otherarthropods include, without limitation, R2 elements from Drosophila spp.(fruit fly), Forficula auricularia (earwig), Popillia japonica (Japanesebeetle), Nasonia vitipennis (jewel wasp), Tenebrio molitor (mealworm),Collembola spp. (springtails), Isopoda spp. (pillbugs), and Limuluspolyphemus (horseshoe crab).

The protein and encoding DNA sequences for the R2 element of D.melanogaster are reported, respectively, at Genbank Accession Nos.P16423 and X51967, each of which is hereby incorporated by reference inits entirety. The protein and encoding DNA sequences for the R2 elementof D. mercatorum are reported, respectively, at Genbank Accession Nos.AAB94032 and AF015685, each of which is hereby incorporated by referencein its entirety. The protein and encoding DNA sequences for the R2element of P. japonica are reported, respectively, at Genbank AccessionNos. AAB66358 and L00949, each of which is hereby incorporated byreference in its entirety. The protein and encoding DNA sequences forthe R2 element of N. vitripennis are reported, respectively, at GenbankAccession Nos. AAC34927 and L00950, each of which is hereby incorporatedby reference in its entirety.

Other non-LTR retrotransposon elements and their proteins can be readilyidentified by isolating putative non-LTR retrotransposon elementproteins and testing them for homology with the above-listed R2 proteinsas well as testing them for endonuclease and target-primed reversetranscriptase activity as described, for example, in Luan et al.,“Reverse Transcription of R2Bm RNA is Primed by a Nick at theChromosomal Target Site: A Mechanism for Non-LTR Retrotransposition,”Cell 72:595-605 (1993); Luan et al., “RNA Template Requirements forTarget DNA-Primed Reverse Transcription by the R2 RetrotransposableElement,” Mol. Cell Biol. 15(7):3882-3891 (1995); Luan et al.,“Downstream 28S Gene Sequences on the RNA Template Affect the Choice ofPrimer and the Accuracy of Initiation by the R2 Reverse Transcriptase,”Mol. Cell Biol. 16(9):4726-4734 (1996), each of which is herebyincorporated by reference in its entirety. Once identified, DNAmolecules encoding the non-LTR retrotransposon protein can be isolatedusing standard techniques known to those skilled in the art.

Fragments of the above-identified non-LTR retrotransposon proteins canalso be utilized in accordance with the present invention. It haspreviously been demonstrated that the protein encoded by the R2 elementof a number of arthropods possess multiple functional domains, includingan N-terminal DNA binding domain, a central reverse transcriptasedomain, and a C-terminal endonuclease domain (Burke et al., “The DomainStructure and Retrotransposition Mechanism of R2 Elements Are ConservedThroughout Arthropods,” Mol. Biol. Evol. 16(4):502-511 (1999); Yang etal., “Identification of the Endonuclease Domain Encoded by R2 and OtherSite-Specific, Non-Long Terminal Repeat Retrotransposable Elements,”Proc. Natl. Acad. Sci. USA 96:7847-7852 (1999), each of which is herebyincorporated by reference in its entirety).

Suitable fragments can be produced by several means. Subclones of thegene encoding a known non-LTR retrotransposon protein can be producedusing conventional molecular genetic manipulation for subcloning genefragments, such as described by Sambrook et al., Molecular Cloning: ALaboratory Manual, Cold Springs Laboratory, Cold Springs Harbor, N.Y.(1989), and Ausubel et al. (ed.), Current Protocols in MolecularBiology, John Wiley & Sons (New York, N.Y.) (1999 and precedingeditions), each of which is hereby incorporated by reference in itsentirety. The subclones then are expressed in vitro or in vivo inbacterial cells to yield a smaller protein or polypeptide that can betested for reverse transcriptase activity, e.g., using known proceduresor procedures set forth in U.S. Pat. No. 6,100,039 to Burke et al. andU.S. Pat. No. 6,132,995 to Gronowitz et al., each of which is herebyincorporated by reference in its entirety.

In another approach, based on knowledge of the primary structure of thenon-LTR retrotransposon protein, fragments of the gene may besynthesized using the PCR technique together with specific sets ofprimers chosen to represent particular portions of the protein, i.e.,encoding a fragment having reverse transcriptase activity (see Erlich etal., “Recent Advances in the Polymerase Chain Reaction,” Science252:1643-51 (1991), which is hereby incorporated by reference in itsentirety). These can then be cloned into an appropriate vector forexpression of a truncated protein or polypeptide from bacterial cells.

Fusion proteins which include the reverse transcriptase can also be usedin accordance with the invention. Such fusion proteins may comprise, forexample, a carrier protein which has a leader sequence of hydrophobicamino acids at the amino terminus of the reverse transcriptase domain.This carrier protein is normally excreted through the membrane of thecell within which it is made. By cleavage of the hydrophobic leadersequence during excretion, a means is provided for producing apolypeptide having reverse transcriptase activity, which can berecovered either from the periplasmic space or the medium in which thebacterium is grown. The use of such a carrier protein allows isolationof polypeptides having reverse transcriptase activity withoutcontamination by other proteins within the bacterium, and may achieveproduction of a form of reverse transcriptase having greater stabilityby avoiding the enzymes within the bacterial cell which degrade foreignproteins. The DNA and amino acid sequences for such hydrophobic leadersequences, as well as methods of preparing such fusion proteins aretaught, e.g., in U.S. Pat. No. 4,411,994 to Gilbert et al., which ishereby incorporated by reference in its entirety.

It is also possible to prepare fusion proteins comprising a polypeptidehaving reverse transcriptase activity that is linked via peptide bond atthe amino or carboxy termini with polypeptides which stabilize or changethe solubility of the polypeptide having reverse transcriptase activity.An amino-terminal gene fusion which encodes reverse transcriptase,having both DNA polymerase and RNase activity, and trpE is taught, e.g.,by Tanese et al., Proc. Natl. Acad. Sci. USA 82:4944-4948 (1985), whichis hereby incorporated by reference in its entirety. A carboxy-terminalgene fusion which encodes reverse transcriptase and part of the plasmidpBR322 tet gene is taught, e.g., by Kotewicz et al., Gene 35:249-258(1985); and Gerard, DNA 5:271-279 (1986), each of which is herebyincorporated by reference in its entirety.

A DNA molecule encoding the non-LTR retrotransposon protein orpolypeptide having reverse transcriptase activity can be incorporated incells using conventional recombinant DNA technology. Generally, thisinvolves inserting the DNA molecule into an expression system to whichthe DNA molecule is heterologous (i.e., not normally present). Theheterologous DNA molecule is inserted into the expression system orvector in sense orientation and correct reading frame. Depending on thevector, the DNA molecule can be ligated to appropriate regulatorysequences either prior to its insertion into the vector (i.e., as achimeric gene) or at the time of its insertion (i.e., thereby formingthe chimeric gene). The DNA molecule can be cloned into the vector usingstandard cloning procedures in the art, as described by Maniatis et al.,Molecular Cloning: A Laboratory Manual, Cold Springs Laboratory, ColdSprings Harbor, N.Y. (1982), which is hereby incorporated by referencein its entirety.

U.S. Pat. No. 4,237,224 to Cohen and Boyer, which is hereby incorporatedby reference in its entirety, describes the production of expressionsystems in the form of recombinant plasmids using restriction enzymecleavage and ligation with DNA ligase. These recombinant plasmids arethen introduced by means of transformation and replicated in unicellularcultures including prokaryotic organisms and eukaryotic cells grown intissue culture.

Recombinant genes may also be introduced into viruses, such as vacciniavirus. Recombinant viruses can be generated by transfection of plasmidsinto cells infected with virus.

Suitable vectors include, but are not limited to, the following viralvectors such as lambda vector system gt11, gt WES.tB, Charon 4, andplasmid vectors such as pBR322, pBR325, pACYC177, pACYC184, pUC8, pUC9,pUCI8, pUC19, pLG339, pR290, pKC37, pKC101, SV 40, pBluescript II SK +/−or KS +/− (see “Stratagene Cloning Systems” Catalog (1993) fromStratagene, La Jolla, Calif., which is hereby incorporated by referencein its entirety), pQE, pIHS21, pGEX, pET series (see Studier et. al.,“Use of T7 RNA Polymerase to Direct Expression of Cloned Genes,” GeneExpression Technology, vol. 185 (1990), which is hereby incorporated byreference in its entirety), and any derivatives thereof. Suitablevectors are continually being developed and identified.

Recombinant molecules can be introduced into host cells viatransformation, transduction, conjugation, mobilization, orelectroporation.

A variety of host-vector systems may be utilized to express theprotein-encoding sequence(s). Primarily, the vector system must becompatible with the host cell used. Host-vector systems include but arenot limited to the following: bacteria transformed with bacteriophageDNA, plasmid DNA, or cosmid DNA; microorganisms such as yeast containingyeast vectors; mammalian cell systems infected with virus (e.g.,vaccinia virus, adenovirus, etc.); insect cell systems infected withvirus (e.g., baculovirus); and plant cells infected by bacteria ortransformed via particle bombardment (i.e. biolistics). The expressionelements of these vectors vary in their strength and specificities.Depending upon the host-vector system utilized, any one of a number ofsuitable transcription and translation elements can be used.

Different genetic signals and processing events control many levels ofgene expression (e.g., DNA transcription and messenger RNA (“mRNA”)translation).

Transcription of DNA is dependent upon the presence of a promoter whichis a DNA sequence that directs the binding of RNA polymerase and therebypromotes mRNA synthesis. The DNA sequences of eukaryotic promotersdiffer from those of prokaryotic promoters. Furthermore, eukaryoticpromoters and accompanying genetic signals may not be recognized in ormay not function in a prokaryotic system, and, further, prokaryoticpromoters typically are not recognized and do not function in eukaryoticcells.

Similarly, translation of mRNA in prokaryotes depends upon the presenceof the proper prokaryotic signals which differ from those of eukaryotes.Efficient translation of mRNA in prokaryotes requires a ribosome bindingsite called the Shine-Dalgarno (“SD”) sequence on the mRNA. Thissequence is a short nucleotide sequence of mRNA that is located beforethe start codon, usually AUG, which encodes the amino-terminalmethionine of the protein. The SD sequences are complementary to the3′-end of the 16S rRNA (ribosomal RNA) and probably promote binding ofmRNA to ribosomes by duplexing with the rRNA to allow correctpositioning of the ribosome. For a review on maximizing gene expression,see Roberts and Lauer, Methods in Enzymology, 68:473 (1979), which ishereby incorporated by reference in its entirety.

Promoters vary in their “strength” (i.e., their ability to promotetranscription). For the purposes of expressing a cloned gene, it isdesirable to use strong promoters in order to obtain a high level oftranscription and, hence, expression of the gene. Depending upon thehost cell system utilized, any one of a number of suitable promoters maybe used. For instance, when cloning in E. coli, its bacteriophages, orplasmids, promoters such as the T7 phage promoter, lac promoter, trppromoter, recA promoter, ribosomal RNA promoter, the P_(R) and P_(L)promoters of coliphage lambda and others, including but not limited, tolacUV5, ompF, bla, lpp, and the like, may be used to direct high levelsof transcription of adjacent DNA segments. Additionally, a hybridtrp-lacUV5 (tac) promoter or other E. coli promoters produced byrecombinant DNA or other synthetic DNA techniques may be used to providefor transcription of the inserted gene.

Bacterial host cell strains and expression vectors may be chosen whichinhibit the action of the promoter unless specifically induced. Incertain operons, the addition of specific inducers is necessary forefficient transcription of the inserted DNA. For example, the lac operonis induced by the addition of lactose or IPTG(isopropylthio-beta-D-galactoside). A variety of other operons, such astrp, pro, etc., are under different controls.

Specific initiation signals are also required for efficient genetranscription and translation in prokaryotic cells. These transcriptionand translation initiation signals may vary in “strength” as measured bythe quantity of gene specific messenger RNA and protein synthesized,respectively. The DNA expression vector, which contains a promoter, mayalso contain any combination of various “strong” transcription and/ortranslation initiation signals. For instance, efficient translation inE. coli requires a Shine-Dalgarno (“SD”) sequence about 7-9 bases 5′ tothe initiation codon (“ATG”) to provide a ribosome binding site. Thus,any SD-ATG combination that can be utilized by host cell ribosomes maybe employed. Such combinations include, but are not limited to, theSD-ATG combination from the cro gene or the N gene of coliphage lambda,or from the E. coli tryptophan E, D, C, B or A genes. Additionally, anySD-ATG combination produced by recombinant DNA or other techniquesinvolving incorporation of synthetic nucleotides may be used.

Once the DNA molecules encoding the non-LTR retrotransposon protein orpolypeptide having reverse transcriptase activity, as described above,have been cloned into an expression system, they are ready to beincorporated into a host cell. Such incorporation can be carried out bythe various forms of transformation noted above, depending upon thevector/host cell system. Suitable host cells include, but are notlimited to, bacteria, virus, yeast, mammalian cells, insect, plant, andthe like.

The transformed hosts of the inventions may be cultured under proteinproducing conditions according to any of the methods which are known tothose skilled in the art.

The non-LTR retrotransposon protein or polypeptide having reversetranscriptase activity may be isolated according to conventional methodsknown to those skilled in the art. For example, the cells may becollected by centrifugation, washed with suitable buffers, lysed andsonicated, and the reverse transcriptase isolated by columnchromatography, for example, on DEAE-cellulose, phosphocellulose (seeKotewicz et al., Gene 35:249-258 (1985), which is hereby incorporated byreference in its entirety) or other standard isolation andidentification techniques using, for example, polyribocytidylicacid-agarose, or hydroxylapatite or by electrophoresis orimmunoprecipitation. The non-LTR retrotransposon protein or polypeptideis preferably produced in purified form (preferably, at least about 80%,more preferably at least about 90%, pure).

Having expressed and isolated the non-LTR retrotransposon protein orpolypeptide, it can subsequently be used in accordance with the presentinvention.

According to one aspect of the present invention the non-LTRretrotransposon protein or polypeptide is used to prepare cDNA from RNA.This can be achieved by contacting an RNA molecule, in the presence ofdNTPs, with a non-LTR retrotransposon protein or polypeptide havingreverse transcriptase activity (as described above) under conditionseffective for production of a cDNA molecule complementary to the RNAmolecule, where the contacting is carried out in the absence of a targetDNA molecule of the non-LTR retrotransposon protein or polypeptide.Thereafter, the resulting cDNA can be isolated.

Basically, in the presence of RNA (i.e., a plurality of RNA molecules)and dNTPs, the non-LTR retrotransposon protein or polypeptide will usethe 3′ end of one RNA molecule to prime reverse transcription of anotherRNA, which can be the same or different from the RNA acting as primer.This is illustrated in step (1) of FIG. 12. The protein or polypeptide,characterized by a high degree of processivity, will likely continue tothe end of the RNA template as shown in step (2), at which point it may,but need not, jump to a second RNA template as shown in step (3).Reverse transcription is again likely to continue to the end of thesecond RNA template as shown in step (4). Another template jump may ormay not occur. In most instances, one or more of the RNA molecules whichare reverse transcribed will include a region of interest (i.e., forwhich one or more cDNA copies are desired). It may also be desired tospecifically include in the reaction mixture acceptor and/or donor RNAmolecules having known sequences. Their known sequences can be used toanneal primers for subsequent amplification procedures (infra).

The target DNA sequences for a number of different non-LTRretrotransposons, in particular R2elements, have been identifiedpreviously (Burke et al., “The Domain Structure and RetrotranspositionMechanism of R2 Elements Are Conserved Throughout Arthropods,” Mol.Biol. Evol. 16(4):502-511 (1999), which is hereby incorporated byreference in its entirety). For the R2 element of Bombyx mori, thetarget DNA molecule has a nucleotide sequence according to SEQ ID No: 24as follows: taaacggcgg gagtaactat gactctctta aggtagccaa 50 atgcctcgtcThe cleavage site is between positions 31 and 32 of SEQ ID No: 24. Thenick site on the opposite strand is two bases downstream from thecleavage site (Luan et al., “Reverse Transcription of R2Bm RNA Is Primedby a Nick at the Chromosomal Target Site: A Mechanism for non-LTRRetrotransposition,” Cell 72:595-605 (1993), which is herebyincorporated by reference in its entirety).

There is preferably a sufficient time delay between the steps ofcontacting and isolating, as described above. Suitable time delaysinclude, without limitation, preferably at least about 30 seconds, morepreferably between about 1 minute and 2 hours minutes, even morepreferably between about 10 minutes and 2 hours. The synthesis of acomplete cDNA may be accomplished by adding the R2 protein orpolypeptide and all four dNTPs with the RNA template. The reversetranscription can be carried out under substantially isothermicconditions or under variable temperature conditions. Suitabletemperatures range from about 20° C. to about 40° C., preferably about21° C. to about 35° C., and most preferably at about 22° C. to about 32°C. The particular temperature employed will depend, at least in part onthe desired cDNA product one wishes to obtain, as a greater percentageof full length cDNA products can be obtained using temperatures at about25° C. (i.e., about 22° C. to about 28° C.), while an increase in thetotal yield of cDNA product can be achieved at higher temperatures.

Use of the non-LTR retrotransposon protein or polypeptide, in particularthe R2 protein or polypeptide, offers a number of distinct advantagesover retroviral reverse transcriptases. With respect to the RNAmolecule, the RNA does not need a particular primer site; hence, it doesnot require a polyadenylation region as needed by retroviral RT.However, when polyadenylated RNA molecules are reverse transcribed, thepolyadenylated region affords a primer binding site that can be used forprimer-directed cDNA extension, resulting in a known polyT region at the5′ end of a cDNA molecule. In addition, non-LTR retrotransposon proteinsor polypeptides like the R2 protein or polypeptide are capable ofcarrying out reverse transcription irrespective of the RNA structure.Retroviral RTs frequently stop at certain sequences or in regions whichcontain a secondary structure, such as stem or stem/loop formations orduplex formations upstream of the template extension, whereas the R2protein or polypeptide does not.

With respect to the reverse transcription process, the R2 protein orpolypeptide is characterized by a significantly greater processivitythan retroviral reverse transcriptases. The R2 protein or polypeptide ischaracterized by a speed of about 880 nt per minute, which is comparableto retroviral reverse transcriptases. More important, though, is thestability of the R2 protein or polypeptide once it has started reversetranscription. Because of its stability, the R2 protein or polypeptideis capable of preparing a population of cDNAs where a significantportion of the cDNA molecules are substantially full length reversetranscripts of the RNA template. By substantially full length, it isintended that the cDNAs are at least about 85 percent of the RNAtemplate length, more preferably at least about 90 percent of the RNAtemplate length, even more preferably about 95 percent of the RNAtemplate length. By significant portion, it is intended to denote atleast twice as much as can be prepared using the AMV reversetranscriptase. For example, using a 600 nt template at about 25° C., theR2 protein or polypeptide can prepare a population of cDNA moleculeswhere about 22% are full length while the AMV RT, using the sametemplate at about 37° C. can only prepare a population of cDNA moleculewhere about 1.2% are full length.

According to another aspect of the present invention, the initialreverse transcription process is followed by amplification procedure,whereby the isolated cDNA is amplified using any one of a number ofsuitable amplification procedures.

By way of example, the PCR amplification can be performed followingisolation of the cDNA. Because the PCR utilizes primers to initiatesecond strand synthesis, the cDNA molecules prepared during the reversetranscription process should be labeled at their 5′ ends with a sequencewhich will hybridize with suitable PCR primers. Two approaches can beutilized to label the cDNAs.

According to one approach, an oligoC tail can be added at the 3′ end ofthe cDNA transcripts by incubating them with terminal transferase anddCTPs (Chang et al., Nature 275:617-624 (1978); Maniatis et al.,Molecular Cloning, Cold Spring Harbor Laboratory (1982), each of whichis hereby incorporated by reference in its entirety). A primer whichanneals to the oligoC tail can be used in subsequent PCR amplification.

According to a second approach, where the sequence of only a portion ofan RNA molecule is known, directed template jumping can be employed toprepare cDNAs starting within the known sequence of the RNA, extendingthrough the unknown sequence (i.e., region of interest) at its 5′ endand having a known sequence located at the 5′ end. This cDNA isimmediately available for PCR amplification, because the unknownsequence (i.e., region of interest) is flanked by known sequences whichcan be annealed by PCR primers. Alternatively, the 3′ end of a partiallyknown RNA molecule can be obtained again using the template jumpingability of R2. Reverse transcription is primed from a known donor RNAsequence, the reverse transcriptase jumps to the 3′ end of the partiallyknown RNA and continues synthesis past the region of known sequence. PCRamplification is again possible, because the cDNA product contains itsunknown sequences (i.e., region of interest) flanked by known sequenceswhich can be annealed by primers.

Basically, the PCR process is carried out in step-wise fashion usingalternating steps of annealing primers, extending primers to achievecomplementary strand synthesis, followed by strand dissociation.Beginning with an isolated as cDNA (or pool of ss cDNAs) containing aregion of interest, a first primer is annealed to the ss cDNA moleculeat a position 3′ of the region of interest and then the primer isextended to form a complementary DNA strand including a complement ofthe region of interest. This complementary DNA strand can then bedissociated from the ss cDNA, at which time it is available forannealing by a second primer at a position 3′ of the complement of theregion of interest. Primer extension is carried out to form a secondcomplementary DNA strand which is substantially the same as thesingle-stranded cDNA molecule at the region of interest. Upondissociating the second complementary cDNA molecule from thecomplementary DNA strand, the entire process can be repeatedindefinitely to amplify the quantity of cDNA which contain the region ofinterest or a complement thereof.

The non-LTR retrotransposon protein or polypeptide is ideally suited forincorporation into a kit which is useful for the preparation of cDNAfrom RNA. Such a kit may include a carrier device compartmentalized toreceive one or more containers, such as vials, tubes, and the like, eachof which includes one of the separate elements used to prepare cDNA fromRNA. For example, there may be provided a first container, the contentsof which include the non-LTR protein or polypeptide in solution.Further, any number of additional containers can be provided, thecontents of which independently include suitable buffers, substrates forDNA synthesis such as the deoxynucleotide triphosphates (e.g., dATP,dCTP, dGTP, and dTTP) either individually or collectively in a suitablesolution, a terminal transferase in solution, donor RNA having a knownnucleotide sequence for use as an RT primer to obtain a 3′ end of RNA,and acceptor RNA having a known nucleotide sequence to obtain a 5′ endof RNA. Any combinations of the above components can be provided.

The R2 protein or polypeptide may be present at about 0.1 μg/ml to about1 μg/ml, preferably about 200 ng/ml to about 500 ng/ml. The bufferconditions for the reverse transcription can range from about 50 toabout 200 mM NaCl, about 1-10 mM MgCl₂, about 0.0 to 0.2% Triton X-100,about 10 to about 250 μM deoxynucleotide triphosphates, at a pH fromabout 7 to about 8.5. The donor and acceptor RNAs for template jumps canbe at concentration from about 0.5 to about 20 μg/ml. The terminaltransferase, if employed, may be present at a concentration of about 0.1μg/ml to about 100 μg/ml, preferably about 5 μg/ml to about 50 μg/ml.

EXAMPLES

The following examples are provided to illustrate embodiments of thepresent invention, but they are by no means intended to limit its scope.The materials and methods described below were utilized in the followingexamples.

Preparations of Target DNA

The DNA substrate for the TPRT reaction was a 164 nt segment of the 28SrRNA gene generated by PCR amplification from clone pB109 using primerAB.j44 (SEQ ID No: 37, Table I) complementary to the 28S sequence 54 bpupstream of the R2 insertion site and primer AB.25 complementary to theregion 110 bp downstream of the R2 site. The PCR was carried out in 50μl reactions containing 10 ng of pB 109, 200 ng each primer, 50 μCi of[α-³²P] dCTP (3,000 Ci/mmol, New England Nuclear), 200 μM each dATP,dGTP and dTTP, and 100 μM dCTP, 2.5-5 U Taq DNA polymerase (LifeTechnologies). The length of the DNA strand designed to be used asprimer was twice as long as that used in previous assays (Yang andEickbush, “RNA-induced Changes in the Activity of the EndonucleaseEncoded by the R2 Retrotransposable Element,” Mol. Cell. Biol.18:3455-3465 (1998); Yang et al., “Identification of the EndonucleaseDomain Encoded by R2 and Other Site-specific, non-Long Terminal RepeatRetrotransposable Elements,” Proc. Natl. Acad. Sci. USA 96:7847-7852(1999), each of which is hereby incorporated by reference in itsentirety) in order to increase the number of α-P³² CTPs that could beincorporated by PCR and, thus, increase the sensitivity of the assay.The PCR amplification products were separated on 8% nativepolyacrylamide gels, the 164 bp band was cut from the gel, and eluted atroom temperature in 0.3M sodium acetate pH 5.2, 0.03% SDS. The elutionbuffer was extracted with phenol/chloroform and the DNA recovered byethanol precipitation.

Preparations of RNA Templates

All RNA templates were generated by in vitro run-off transcription usingeither T7 or T3 RNA polymerase (Fermentas Inc., Life Technologies).Templates were either restriction digested pBSII(SK−) plasmids or PCRamplified products containing the T7 promoter. The 254 nt R2 RNA wastranscribed from a template generated by PCR amplification ofpBmR2-249A4 (Luan et al., “Reverse Transcription of R2Bm RNA is Primedby a Nick at the Chromosomal Target Site: A Mechanism for non-LTRRetrotransposition,” Cell 72:595-605 (1993), which is herebyincorporated by reference in its entirety) using primers AB.13 (SEQ IDNo: 29, Table I) and AB.2b (SEQ ID No: 26, Table I). The 274 nt R2 RNAwas transcribed from a template generated by PCR amplification ofR2Bm249V5′3′ (Luan and Eickbush, “Downstream 28S Gene Sequences on theRNA Template Affect the Choice of Primer and the Accuracy of Initiationby the R2 Reverse Transcriptase,” Mol. Cell. Biol. 16:4726-4734 (1996),which is hereby incorporated by reference in its entirety) using primersAB.13 (SEQ ID No: 29, Table I) and AB.9 (SEQ ID No: 28, Table I). TheseR2 RNA templates differed from those used previously ((Yang andEickbush, “RNA-induced Changes in the Activity of the EndonucleaseEncoded by the R2 Retrotransposable Element,” Mol. Cell. Biol.18:3455-3465 (1998); Yang et al., “Identification of the EndonucleaseDomain Encoded by R2 and Other Site-specific, non-Long Terminal RepeatRetrotransposable Elements,” Proc. Natl. Acad. Sci. USA 96:7847-7852(1999), each of which is hereby incorporated by reference in itsentirety), in that the RNA did not included 30 nt of pBSII(SK−) sequenceat the 5′ end of the RNA. The presence of these C-rich plasmid sequencesreduce the efficiency of the TPRT reaction.

The 334 nt vector RNA was transcribed from the pBSII(SK−) plasmid(Stratagene) predigested with PvuII. The 183 nt vector RNA wastranscribed from a template generated from amplification of pBSII(SK−)using primers AB.8 (SEQ ID No: 27, Table I) and AB.T7 (SEQ ID No: 36,Table I). The 600 nt RNA was transcribed like the 334 nt RNA except thata KpnI and BamHI fragment of the R1Dm element (position 5020-5340) wascloned into the polylinker region of pBSII(SK−) (provided by D.Eickbush). The 1090 nt vector RNA was transcribed from pBSII(SK−)predigested with XmnI using T3 RNA polymerase. Finally, the 177 nt donorRNA in FIG. 7 was transcribed from a PCR template using primers AB.1(SEQ ID No: 25, Table I) and AB.9 (SEQ ID No: 28, Table I) and theplasmid pB108 (Xiong and Eickbush, “Functional Expression of aSequence-specific Endonuclease Encoded by the Retrotransposon R2Bm,”Cell 55:235-246 (1988), which is hereby incorporated by reference in itsentirety).

The in vitro transcription was performed in 80 μl volumes containing 2-5μg of pre-digested plasmid DNA or gel purified PCR fragments, 16 μl 5×transcription buffer, 1 mM each NTP and 150 U of T7 or T3 RNA polymerase(Fermentas Inc., Life Technologies). Reactions were incubated at 37° C.until a pyrophosphate precipitation formed (approximately 1.5 hr). Aftersynthesis the samples were diluted 2-fold, mixed with DNase I buffer andthe DNA templates removed with 10 U of DNase I (Ambion Inc.) for 25 minat 37° C. The products of transcription were ethanol precipitated andseparated on 5% Urea-PAGE. Full-length RNA templates were excised fromthe gel, eluted at room temperature in 0.3M sodium acetate pH 5.2, 0.03%SDS for 1.5 hr, extracted with phenol/chloroform and ethanolprecipitated. The transcripts were dissolved in 50 mM NaCl to a finalRNA concentration of 0.1 μg/μl.

Synthesis of P³²-labeled 254 nt R2 RNA for the gel shift experiments wasperformed according to the Fermentas Inc. protocol for the synthesis ofhigh specific activity radiolabeled RNA using T7 RNA polymerase. RNA wastranscribed from 1 μg of the 254 nt R2 PCR product and DNA templateremoved by incubation with 2 U of DNase I for 15 min at 37° C. RNAyields after purification from 5% Urea-PAGE were determined byscintillation counting and the known specific activity of labelednucleotide in the reaction. TABLE I Definition of Primer Sequences SEQID Primer Nucleotide Sequence No: AB.1CTGCAGTAATACGACTCACTATAGGACTTGGGGAATCCGACT 25 AB.2b TTTTCATCGCCGGATCATC26 AB.8 GGAAACAGCTATGACCATG 27 AB.9 GATGACGAGGCATTTGGCTA 28 AB.13CTGCAGTAATACGACTCACTATAGGTTGAGCCTTGCACAGTAG 29 AB.17CGACGGCCAGTGCCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCC 30 AB.18CGGGATCCGAAGCCAAGGGAGCGAG 31 AB.19 GCTCTAGAGCGTACGGCCACGATC 32 AB.23GGGGTACCGACAGGTTTCCCGACTG 33 AB.25 GCTCTAGAGTTCCCTTGGCTGTGGT 34 AB.26GCTCTAGAGCAAGCAAGCGCGGGTAAACGGCGGGAGTAACTATGACTCTCTTAA 35 AB.T7TAATACGACTCACTATAG 36 AB.j44 AATTCAAGCAAGCGCGG 37 AB.34CGTTCTTCGGGGCGAAAACTC 38Reverse Transcription Assays

Unless otherwise specified, all RT reactions were performed in 30 μlvolumes containing 50 mM Tris-HCl (pH 7.5), 200 mM NaCl, 10 mM MgCl₂,2.5 mM DTT, 0.01% Triton X-100 and 25 μM each dNTP. The concentration ofthe R2 protein was 0.8-4 nM (3-15 ng). In all TPRT reactions, labeledtarget DNA (see previous section) was present at a concentration of 6-12nM (20-40 ng). In most other reactions, 15 μCi of [α-³²]dCTP3,000Ci/mmol (New England Nuclear) was added. In the primer extensionassays, the DNA oligonucleotides (concentrations as specified in theFigures) were annealed to the RNA by heating to 96° C. and slow cooling(3.5° C./min) to 25° C. For the reactions in FIG. 7, the DNA primer wasend-labeled with polynucleotide kinase. End-labeling reactions wereperformed accordingly to Fermentas Inc. protocol in 20 μl volumecontaining 200 ng primer and 25 μCi of [γ-³²P]ATP 3,000 Ci/mmol (NewEngland Nuclear). The polynucleotide kinase was inactivated by heatingto 96° C. for 10 min. All reverse transcription reaction were incubatedat 37° C. for 30-50 min, and stopped by heating at 96° C. for 5 min.Unless otherwise indicated, the excess RNA was removed by digestion with1-2 μg of RNase A for 10 min at 37° C. and ethanol precipitated beforeelectrophoresis.

Mobility Shift Analysis

RNA gel mobility shift reactions were performed with 10 ng of R2 proteinand 10 ng of [³²P]-labeled 254 nt R2 RNA in a 20 μl reaction mixturecontaining 50 mM Tris-HCl (pH 7.5), 200 nm NaCl, 10 mM MgCl₂, 2.5 mMDTT, 0.01% Triton X-100. The RNA and protein were preincubated at 37° C.for 15 min, placed on ice for 5 min, mixed with 2 μl of loading buffer(0.2% bromophenol blue, 0.02% xylene cyanol FF and 60% glycerol), andanalyzed on 5% native polyacrylamide gels containing 5% glycerol (1/55acrylamide/bisacrylamide). The electrophoresis was performed at 4° C.The identical procedure was applied for the DNA mobility shift assayexcept that 10 ng R2 protein was preincubated with 20 ng of labeledtarget DNA.

Analysis of the Junction Sequences Derived from Template Jumps

Unless otherwise indicated the band corresponding to the template jumpproduct was excised from a polyacrylamide gel, eluted with 0.3 M sodiumacetate pH 5.2, 0.03% SDS for several hours at room temperature,phenol/chloroform extracted and ethanol precipitated. The isolated cDNAwas then used as a template for PCR amplification using the primerindicated in the figure legends. The PCR products were directly clonedinto mp18T2 (Burke et al., “R4, a non-LTR Retrotransposon Specific tothe Large Subunit rRNA Gene of Nematodes,” Nucleic Acids Res.23:4628-4634 (1995), which is hereby incorporated by reference in itsentirety) and individual clones sequenced.

Example 1 Recombinant Expression of the Bombyx more R2 Protein

The expression construct, pR260, was derived from construct pR250 (Xiong& Eickbush, Cell 55:235-246 (1988), which is hereby incorporated byreference in its entirety). A 3.5 kb Smal fragment of pR250 from 18 bpupstream of the first methionine codon to the 3′ end untranslated regionwas subcloned into pUC18 in-frame with the lacZ gene. E coli strainJM109/pR260 was grown at 37° C. in LB broth until an OD₅₉₅=0.5-0.6.Isopropylthio-β-galactoside (IPTG) was then added to a finalconcentration of 0.2 mM and the cultures were further incubated for 1hour. Cells are harvested by centrifugation, washed in cold 50 mMTris-HCl, pH 8.0, and collected by centrifugation. The followingprocedure is described for a 1.5 liters of cells but can be scaled tolarger or smaller culture volumes. All procedures are conducted at 0-4°C. The cell pellets are resuspended in 6.8 ml buffer A (0.1 mM Tris-HClpH7.5, 5 mM EDTA, 50% glycerol) and incubated for 30 minutes in 5 mMdithiothreitol (DTT), 2 mM benzamidine-HCl, and 2 mg/ml lysozyme. 32 mlof buffer B (0.1 M Tris-HCl pH7.5, 1 M NaCl, 5 mM DTT, 0.2% tritonX-100, 10 mM MgCl₂, 2 mM benzamidine) is then added, followed by anadditional 30 minutes incubation. The lysate is centrifuged in a SW50.1rotor at 33,000 rpm for 20 hours. The upper 1 ml of the supernate fromeach tube contains little R2 protein and is discarded. The remaining 4ml of supernate from each tube is decanted and diluted with H₂O to lowerthe NaCl concentration to 0.4 M. The diluted crude extract is loadedonto a 15 ml Q Sepharose-fast-flow column (Pharmacia) equilibrated in0.4 M NaCl/buffer C (25 mM Tris-HCl pH 7.5, 2 mM DTT). The column iswashed with 50 ml of the 0.4 M NaCl/buffer C, and the R2 protein elutedwith 0.6 M NaCl/buffer C. Fractions containing the R2 endonucleaseactivity are pooled, dialyzed against 0.2 M NaCl/buffer D (25 mMTris-HCl pH7.5, 2 mM DTT, 10% Glycerol), and applied to a 1.5 mlDNA-cellulose column (Pharmacia) equilibrated with 0.4 M NaCl/buffer D.The column is washed with 9 ml 0.4 M NaCl/buffer D and eluted with 0.8 MNaCl/buffer D. R2 protein eluted from the DNA cellulose column isconcentrated approximately 5-fold on a Centricon-50 column (Amico) anddialyzed against 50% glycerol, 0.4 M NaCl, 25 mM Tris-HCl (pH 7.5) and 2mM DTT at 4° C. A typical final volume is 100-200 μl of concentratedsolution containing from 5-15 ng/μg R2 protein. The protein can bestored after dialysis at −20° C. for several month with only minordecreases in activity.

Protein concentrations were determined on SDS-polyacrylamide gels usingthe fluorescent stain SYPRO Orange (Bio-Rad Laboratories). The intensityof the R2 band was compared with known concentrations of bovine serumalbumin using the fluoroimaging function of a Storm 860 PhosphorImagerand Image Quant.

Example 2 RNA Template Jumping After Target-Primed Reverse Transcription

The signature step of the TPRT reaction is the use of the 3′ hydroxylgroup released by first-strand cleavage of the DNA target site as primerfor cDNA synthesis. This cleavage/reverse transcription reaction can bestudied in vitro using purified components as shown in FIG. 1. Thesubstrate in the assay is a uniformly P³² labeled DNA fragmentcontaining the 28S rRNA gene insertion site. The R2 cleavage site onthis substrate is positioned such that a 110 nt fragment is used as theprimer. The RNA templates added to the reaction contain the minimumsequences needed to initiate the TPRT reaction: the 3′ untranslatedregion of the R2 element. The R2 RNA templates are either 254 nt inlength, if the RNA ends at the precise 3′ junction of the RR2 element,or 274 nt in length, if the RNA extends 20 nt into the downstream 28Sgene sequences.

Denaturing polyacrylamide gel electrophoresis of typical TPRT reactionsare shown in FIG. 1B. To allow maximum separation of the TPRT productsthe small, previously described (Luan et al., “Reverse Transcription ofR2Bm RNA is Primed by a Nick at the Chromosomal Target Site: A Mechanismfor non-LTR Retrotransposition,” Cell 72:595-605 (1993), which is herebyincorporated by reference in its entirety) DNA cleavage products havebeen run off the bottom of the gel. The major TPRT products generatedfrom the 254 nt R2 RNA (lane 1) and the 274 nt R2 RNA (lane 2) are bothapproximately 365 nt in length because reverse transcription starts atthe 3′ end of the R2 sequences on the RNA template, irrespective ofwhether this sequence is located at the 3′ end or an internal positionof the template (110 nt DNA primer+254 nt RNA template=364 nt) (Luan andEickbush, “Downstream 28S Gene Sequences on the RNA Template Affect theChoice of Primer and the Accuracy of Initiation by the R2 ReverseTranscriptase,” Mol. Cell. Biol. 16:4726-4734 (1996), which is herebyincorporated by reference in its entirety). Also visible in both lanesare longer reaction products. A distinct band at 620 nt and a faint bandat 870 nt are present in lane 1. The 620 nt product could be formed ifthe R2 reverse transcriptase, after completing synthesis of the firstRNA template, was able to jump to the 3′ end of a second RNA templateand continue synthesis (110 nt+2×254 nt=618 nt). The faint 870 nt bandcould be explained by two consecutive jumps (110 nt+3×254 nt 872 nt). Inthe case of the 274 nt R2 template (lane 2), the longer cDNA productsare about 640 and 910 nt. Because the TPRT products generated with the274 nt R2 RNA are 20 and 40 nt longer than those formed with the 254 ntRNA, the putative jumps between templates would appear to involve the 3′end of the second template, not the internal site used to initiate theTPRT reaction.

To obtain direct support for jumps between RNA templates, the 638 ntcDNA fragment was isolated from lane 2 and the putative jump region ofthe cDNA was PCR amplified. The sequence of individual cloned productsare shown in FIG. 2. All six sequenced junctions revealed that the R2enzyme had extended to the terminal 5′ nucleotide of the first RNAtemplate and continued polymerization at the first 3′ nucleotide of thesecond RNA template. One clone contained a six nucleotide deletion near,but not at the 3′ end of the second RNA template. Four of the sixjunctions had an additional nucleotide between the two RNA sequences.These extra nucleotides could have been added by the R2 reversetranscriptase during the jump between templates. A similar addition ofnon-templated nucleotides has previously been observed when the R2reverse transcriptase initiates the TPRT reaction (Luan et al., “ReverseTranscription of R2Bm RNA is Primed by a Nick at the Chromosomal TargetSite: A Mechanism for non-LTR Retrotransposition,” Cell 72:595-605(1993); and Luan and Eickbush, “RNA Template Requirements for TargetDNA-Primed Reverse Transcription by the R2 Retrotransposable Element,”Mol. Cell. Biol. 15:3882-3891 (1995), each of which is herebyincorporated by reference in its entirety). The extra nucleotides couldalso have been generated during the in7 vitro synthesis of the RNAtemplate. T7 RNA polymerase has been shown to add an additional residue(usually A) in DNA run-off react ions similar to those employed togenerate RNA templates (see Millagan and Uhlenbeck, “Synthesis of SmallRNAs using T7 RNA Polymerase,” Meth. Enzymol. 180:51-62 (1989), which ishereby incorporated by reference in its entirety).

Reverse transcription of the terminal nucleotides of the donor andacceptor RNAs eliminates the possibility that the jumps between RNAtemplates are promoted by annealing of the 5′ end of the newlysynthesized cDNA to the acceptor RNA template. Furthermore, initiationof reverse transcription at the terminal 3′ nucleotide of the 274 ntacceptor RNA template, rather than 20 nucleotides internally, suggestsan important role of free 3′ ends rather than the RNA secondarystructure (Mathews et al., “Secondary Structure Model of the RNARecognized by the Reverse Transcriptase from the R2 RetrotransposableElement,” RNA 3:1-16 (1997), which is hereby incorporated by referencein its entirety) in the template jumping reaction. These properties ofthe R2 reverse transcriptase are in sharp contrast to the strandtransfers which occur in the reverse transcription cycles ofretroviruses and LTR-retrotransposons.

The R2 reverse transcriptase, at least under the in vitro conditiondescribed here, cannot efficiently use the 3′ end of a RNA:DNA hybrid toinitiate reverse transcription. Therefore, the template jumps in FIG. 1Bappear to involve the ability of the actively elongating R2 reversetranscriptase to associate with the 3′ end of a second RNA templatebefore it dissociates from the first RNA template. This reaction canthus be viewed as continuous cDNA synthesis on non-continuous RNAtemplates.

If it is assumed that the R2 enzyme, upon reaching the 5′ end of thefirst RNA template, has only limited time to bind another RNA templatebefore it dissociates, then one would predict that the frequency of thetemplate jumps would be dependent upon the concentration of free RNAends in the reaction. Therefore, a series of reactions were conductedwhere the concentration of the 254 nt RNA template was incrementallyincreased. The frequency of the template jumps (618 nt fragment)relative to the total TPRT products over a 100 fold range in RNAconcentration was plotted in FIG. 3. As predicted, the frequency oftemplate jumping increased as the concentration of RNA increased. At thehighest concentration of RNA tested, 40 nM, approximately 13% of theTPRT reactions underwent a template jump.

Example 3 RNA Priming of the Reverse Transcription Reaction

To determine if there is specificity for the RNA used as acceptor in thetemplate jump, the TPRT assays were also conducted in the presence of anexcess of non-R2 competitor RNA. The 334 nt competitor RNA was atranscript of the pBSII(SK−) plasmid. It has previously been shown thatonly those RNAs that contain the 3′ untranslated region of the R2element can be used as templates in the TPRT reaction (Luan andEickbush, “RNA Template Requirements for Target DNA-primed ReverseTranscription by the R2 Retrotransposable Element,” Mol. Cell. Biol.15:3882-3891 (1995), which is hereby incorporated by reference in itsentirety). This specificity was confirmed in FIG. 4, as the only initialTPRT product observed was the approximately 365 nt fragment generatedfrom the R2 RNA template. TPRT products generated from the longer vectorRNA would be approximately 445 nt in length. In a similar manner, theonly products resulting from template jumps between RNAs wereapproximately 640 nt and 910 nt in length (compare lanes 1 and 3),indicating that only the R2 RNA was being used as acceptor of the jumpeven in the presence of an 8 fold molar excess of the competitor RNA.

Example 4 Reverse Transcription in the Absence of DNA Target Site

It was determined whether the R2 enzyme could undergo template jumpsduring reverse transcription reactions that are primed by a non-specificmethod of annealing a short DNA oligonucleotide to an RNA template (FIG.5B). In such primer extension assays, both R2 and non-R2 RNA can be usedby the R2 reverse transcriptase as templates (Luan et al., “ReverseTranscription of R2Bm RNA is Primed by a Nick at the Chromosomal TargetSite: A Mechanism for non-LTR Retrotransposition,” Cell 72:595-605(1993), which is hereby incorporated by reference in its entirety). Anextension assay using a DNA oligonucleotide annealed to the 20 nt at the3′ end of the 334 nt vector RNA is shown in FIG. 5A, lane 1. As expectedthe major cDNA product was 334 nt in length corresponding to simpleextension by the reverse transcriptase to the end of the vector RNA.Also produced was a cDNA fragment approximately 670 nt, the lengthexpected for reverse transcription of two consecutive RNA molecules.Thus, in the absence of the DNA target and R2 RNA, the R2 reversetranscriptase can also undergo template jumps between vector RNAsequences.

During analysis of these primer extension assays, it was noticed thatthe synthesis of cDNA was not completely dependent upon the presence ofa DNA primer annealed to the RNA template. Approximately 20% of the cDNAsynthesis could be generated without a primer, suggesting an alternativemeans of priming reverse transcription. It has previously been shownthat under conditions of a TPRT reaction, the R2 protein can use the 3′end of a second RNA molecule to prime reverse transcription (Luan andEickbush, “Downstream 28S gene Sequences on the RNA Template Affect theChoice of Primer and the Accuracy of Initiation by the R2 ReverseTranscriptase,” Mol. Cell. Biol. 16:4726-4734 (1996), which is herebyincorporated by reference in its entirety). Sequence analysis of theproducts of these reactions indicated that the ‘primer’ RNA had notannealed to the R2 ‘template’ RNA (Luan and Eickbush, “Downstream 28Sgene Sequences on the RNA Template Affect the Choice of Primer and theAccuracy of Initiation by the R2 Reverse Transcriptase,” Mol. Cell.Biol. 16:4726-4734 (1996), which is hereby incorporated by reference inits entirety). Here, it is shown that elimination of both the DNA targetand R2RNA from the reaction enables the protein to conduct thisRNA-priming with non-specific RNA templates (diagramed in FIG. 5B).

Results of an RNA-primed reverse transcription reaction using the 334 ntvector RNA template is shown in FIG. 5A, lanes 2 and 3. In lane 2, thereaction products have been treated with RNase A before separation onthe polyacrylamide gel, while the reaction products in lane 3 have notbeen treated with RNase A. In lane 3, the major product was a diffuseband slightly larger than 600 nt. After treatment with RNase A theproducts were reduced to a major band at approximately 324 nt with faintbands extending up to 334 nt. This reduced length of the cDNA productscompared to lane 1 indicated that the preferred site of initiation ofreverse transcription in the RNA-primed reaction was about 10 nt fromthe 3′ end of the RNA template. The presence of a 660 nt cDNA product inlane 2 indicated that the R2 enzyme can also undergo template jumps inthese RNA-primed reactions.

Example 5 RNA Preferences in the Absence of the DNA Target Site

In FIG. 6, the efficiency of RNA-primed reverse transcription andtemplate jumping are compared between three RNAs: a 183 nt non-R2 RNAderived from pBSII(SK−), the 254 nt R2 RNA, and 334 nt vector RNA.Reverse transcription was primed by the RNAs themselves (no DNAprimers), and all products were digested with RNase A to remove theseRNA primers before electrophoresis. In the case of the short vector RNA(lane 1), the initial cDNA products were approximately 180 nt in length,which is consistent with reverse transcription starting near the 3′ endof the RNA template. Template jumping was highly efficient with this RNAas cDNA fragments of 360, 540, 720 and 900 nt were generatedrepresenting one, two, three, and four consecutive template jumps. Withthe R2 RNA template (lane 2), the major cDNA fragments were about 250nt, consistent with reverse transcription starting near the 3′ end ofthe RNA template, while a faint band approximately 15 nt shorter thanthe major band suggested cDNA synthesis also initiated at a moreinternal site. One, two, and three template jumps were seen with the R2RNA template. In the case of the 334 nt vector RNA, priming of cDNAsynthesis occurred at several sites near the 3′ end of the RNA, and bothsingle and double template jumps were detected (660 and 1000 ntfragments).

The relative efficiency of the initial RNA-primed reverse transcriptionreaction and of the template jumps for each of the three RNAs arecompared in Table II (below). The efficiency of the initial RNA-primedreverse transcription step was highest with the R2 RNA template andlowest for the 334 nt RNA. The frequency of template jumps once reversetranscription initiated was 13-15% with the 254 and 183 nt RNA, but only4% with the 334 nt RNA. TABLE II RNA Specificity of the RNA-primedReverse Transcription Assay Frequency of RNA template RNA-primed RTTemplate Jumps 254 nt (R2 RNA) 1.00 12.9% 183 nt (vector RNA) 0.70 15.1%334 nt (vector RNA) 0.44  3.9% 254 nt + 183 nt . . . 254 nt 1.59 10.9%254 nt + 183 nt . . . 183 nt 0.17  2.1% 254 nt + 334 nt . . . 254 nt0.86 10.1% 254 nt + 334 nt . . . 334 nt 0.14  0.7%All values are derived from the data in Figure 6. Values for theRNA-primed reverse transcription represent the combination of allRNA-unit-length bands visible in the lane and are given as a fractionrelative to that supported by R2 RINA alone. The frequency of templatejumps are given as pereentaces of the total reverse transcripts thathave undergone a template jump and is the combined values for all jumps(single, double, etc. corrected for the length of the eDNA fragment).Values for the templatejumps in the competition reactions represent only those between similartemplates (i.e. 254 nt to 254 nt jumps), and do not include the hybridbands (i.e. 430 nt in lane 4 and 590 in lane 5) as it is uncertain whichRNAs were the donors and acceptors in these jumps.

Lanes 4 and 5 of FIG. 6 are the cDNA products of competition experimentsbetween equal molar ratios of the of the R2 RNA and the individualvector RNAs. In the case of the 183 nt and 254 nt competition, thesignificant reduction in intensity of the 180 nt band and the increasedintensity of the 250 nt band indicated that the R2 RNA was the preferredtemplate in the RNA-priming reaction. In the case of the 254 nt and 334nt competition, reverse transcription of R2 RNA was again preferred overthe longer vector RNA (lane 5). Thus, RNA-primed cDNA synthesis occurredmore readily with the R2 RNA template than with either the shorter orlonger vector RNAs. However, stimulation of cDNA synthesis from the R2RNA by the addition of the 183 nt RNA, (Table 2), suggested the shortvector RNA functioned more efficiently than the R2 RNA in primingreverse transcription of the R2 template.

The competition experiments in FIG. 6 also suggested that templatejumping occurred preferentially to the R2 template. In both lanes 4 and5, the template jumps between R2 RNAs (fragments at 500 and 750 nt) wereonly slightly reduced compared to that in lane,3. Meanwhile, the levelsof vector RNA to vector RNA jumps in lane 4 (360 nt) and lane 5 (664 nt)were reduced from 5 to 7 fold (see Table 1). Most jumps from the vectorRNAs appear to have gone to the R2 RNA as hybrid products (430 nt inlane 4, and 590 nt in lane 5) were readily apparent in both lanes. Theseresults indicate that even in the absence of the R2 RNA template, the R2protein can undergo RNA-primed reverse transcription and template jumpwith non-R2 RNA templates. However, the R2 protein prefers to initiatereverse transcription on the R2 RNA as well as use R2 RNA as theacceptor of template jumps.

To affirm that these template jumps in the RNA-primed reactions occurredwithout annealing of the cDNA to the acceptor RNA template, the cDNAregion corresponding to a 334 nt donor to 254 nt acceptor template jumpwas PCR amplified from the total reaction products of lane 5. The totalreaction products were used in the amplification rather than thepurified 590 nt hybrid in order to sample the many faint productsvisible in FIG. 6A that are not of unit RNA length (i.e., cannot beattributed to any combination of 254 and 334 nt RNAs). As shown in FIG.6B, in five of the seven sequenced junctions, the template jump occurredto the terminal 3′ nucleotide of the acceptor R2 RNA. In the tworemaining cases, the jumps were to sites 5 and 7 nucleotides from the 3′end, but in neither case did the jump involve sequences that wouldenable the cDNA made from the 334 nt RNA to anneal to the R2 RNA.Several of the junctions also represented reverse transcriptionreactions that did not proceed to the 5′ end of the 334 nt vector RNA.These premature jumps can explain many of the faint product bands seenon gels, however it is not clear what fraction of these products were aresult of RNA degradation, and what fraction represented template jumpsbefore the reverse transcriptase reached the end of the first template.

Example 6 Directing Template Jumps Between RNA's

Increasing the ratio of DNA oligonucleotides used to anneal to the 3′end of an RNA template will block this RNA from being an acceptor of atemplate jump. As shown in FIG. 7A (lane 3), using a 6-fold molar excessof primer to RNA template (three times higher than used in FIG. 5,lane 1) resulted in the synthesis of full-length cDNA products (˜177nt), but no template jumps. Template jumps were readily observed if asecond RNA template, that did not anneal to the DNA primer, was added tothe reaction. In lane 2, the addition a the 183 nt vector RNA resultedin nearly 40% of the cDNA undergoing a template jump (many underwentmultiple consecutive jumps). In lane 1, the addition of the 334 ntvector RNA resulted in about 6% of the cDNA undergoing a template jump.

FIG. 7B shows the relative efficiency of directed template jumps betweenthe donor RNA and even longer RNAs (334 nt, 600 nt and 1090 nt). Thenature of these three RNAs are described above. While template jumpsonto each of these RNAs were observed, the relative efficiencies ofthese jumps varied as did the efficiency of the initial primer extensionreaction itself. The reduction in efficiency of the primer extensionreaction with different RNAs is possibly a result of the preference ofthe R2 protein to bind certain RNAs and, thus, to reduce its ability tobind the donor RNA. Meanwhile the general decrease in the efficiency ofthe template jumps with longer RNAs is likely to be a mass affect inwhich the R2 protein is more likely to encounter the 3′ end of a shorterRNA compared to the 3′ end of a longer RNA.

Example 7 The DNA Target Site Stabilizes Interactions Between the R2Protein and Its Template

While R2 RNA templates are preferred, vector RNAs (or any non-R2 RNAs)can compete as templates in RNA-primed reverse transcription reactionsas well as acceptors in template jumps between RNAs. This contrasts withthe activity of the R2 protein in the presence of the target DNA, inwhich only R2 RNAs can be used as templates in the target DNA-primed orRNA-primed reverse transcription reactions (Luan and Eickbush,“Downstream 28S Gene Sequences on the RNA Template Affect the Choice ofPrimer and the Accuracy of Initiation by the R2 Reverse Transcriptase,”Mol. Cell. Biol. 16:4726-4734 (1996); and Mathews et al., “SecondaryStructure Model of the RNA Recognized by the Reverse Transcriptase fromthe R2 Retrotransposable Element,” RNA 3:1-16 (1997), each of which ishereby incorporated by reference in its entirety), as well as acceptorsof template jumps (FIG. 3). These results suggest that when the R2protein is bound to DNA it has more specific structural requirements forthe RNA used as template, which in turn might mean a higher affinity ofthe protein for the R2 RNA. Direct evidence for an increased affinity ofthe R2 protein for binding the R2 RNA in gel mobility shift assays isdemonstrated below. As shown in FIG. 8A, R2 protein and labeled R2 RNAincubated in the absence of DNA do not generate a gel shift under theconditions of these reactions (lane 1); however a shifted complex isreadily observed if the DNA target is added to the incubation (lane 2).Similar complexes are not observed in the presence of single-strandedDNA (lane 3) or in the presence of double-stranded DNAs notcorresponding to the target site (lane 4).

To confirm that the shifted complex in FIG. 8A is indeed a complex ofRNA, protein, and DNA, the mobility shift assays were also conducted inthe presence of labeled DNA target (FIG. 8B). In the presence of the R2protein and the DNA target, a shifted complex is observed (lane 2). IfR2 RNA is added to the protein and target DNA, then a further reductionin the mobility of the complex (a supershift) is observed (lane 1). Themobility of this RNA:protein:DNA complex is the same whether the DNA islabeled (FIG. 8B, lane 1) or the RNA is labeled (FIG. 8A, lane 2 andFIG. 8B lane 3). These results demonstrate that the DNA target siteincreases the specific interactions between the R2 protein and R2 RNA.In the absence of DNA, only less stable interactions are possiblebetween the R2 protein and its RNA template, which can explain whyvector RNA can substitute for R2 RNA in the reverse transcriptionreactions conducted in the absence of target DNA.

Example 8 Template Jumps Onto Single-Stranded DNA

The ability of the R2 protein to template jump onto another RNA wouldsuggest that the protein may also be able to template jump onto singlestranded DNA. Surprisingly, such jumps have only been observed when R2RNA is being used as the initial template. As shown in FIG. 9,increasing concentrations of a 19 nt DNA primer complementary to the 3′end of the 254 nt R2 RNA template (lanes 1-3) inhibited template jumpingto the R2 RNA itself, similar to that shown above for vector RNAtemplates (FIG. 7). However, with the R2 template a new series of cDNAproducts were generated approximately 20, 40 and 60 nt longer than theR2 RNA template. As shown below, these new products were template jumpsonto the excess DNA oligonucleotide primers in the reaction. Suchtemplate jumps to DNA oligonucleotides were not observed with the donorRNA template in FIG. 7.

To determine if the R2 protein has sequence or length preference forthese templates jumps to ssDNA, two longer ssDNA were tested. One ssDNAcorresponded to a 50 nt segment from the pBSII(SK−) plasmid (FIG. 9A,lane 4), while a second 54 nt ssDNA corresponded to the sequence of the28S gene immediately upstream of the R2 insertion site (lane 5). Thislatter ssDNA was tested because it has previously been postulated thatthe R2 element may complete the R2 integration reaction by jumping ontothese upstream 28S gene sequences and continuing synthesis (Burke etal., “The Domain Structure and Retrotransposition Mechanism of R2Elements are Conserved Throughout Arthropods,” Mol. Biol. Evol.16:502-511 (1999), which is hereby incorporated by reference in itsentirety). Template jumps to both of these ssDNA were readily observed.Based on the lengths of the reaction products, the jumps occurred tolocations near the 3′ end of these oligonucleotides.

To obtain direct evidence for the use of ssDNA as an acceptor oftemplate jumps, the 300-310 nt reverse transcription products from thereaction in lane 5 were excised and PCR amplified using one primerwithin the 54 bp extension and a second primer within the R2 sequence.The sequence of individual clones are shown in FIG. 9B. In most cases,the RT extended to the 5′ end of the R2 RNA template and then jumped tothe terminal 3′ nucleotide of the ssDNA. In two cases, the jump occurredto positions 2 and 9 nt from the 3′ end of the primer. While these couldrepresent jumps to internal positions of the ssDNA, it has been havefound that the R2 endonuclease has single-stranded 3′ exonucleaseactivity. Thus, many of these jumps to apparent internal locations nearthe end of the oligonucleotide may be jumps to the 3′ end of a partiallydegraded ssDNA template. As was seen with the template jumps from RNA toRNA, these RNA to ssDNA templates jumps also contained additional,non-templated nucleotides.

Example 9 Retroviral Reverse Transcriptases Are Unable To Jump Templates

All of the reactions that have been conducted with the R2 protein hereinhave also been conducted using commercially available retroviral RTs.Consistent with many previous studies (see Coffin et al., Retroviruses,Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (1997), which ishereby incorporated by reference in its entirety), these retroviralenzymes were unable to conduct RNA-primed reverse transcription andtemplate jumps to either RNA or ssDNA templates. FIG. 10 is an exampleof such a comparison in which an R2 RNA template has been primed with ashort oligonucleotide. Full-length cDNA products were seen with the R2reverse transcriptase, as well as jumps to the DNA oligonucleotide andRNA template (lane 2). In contrast, under identical conditions thelongest products generated by the AMV RT were only full-length reversetranscripts of the RNA template (lane 1). The greater abundance ofshorter cDNA products seen with the retroviral enzyme was a reflectionof the reduced ability of the AMV RT to extend to the 3′ end of an RNAtemplate compared to the R2 enzyme.

Based on the evidence provided in Examples 2-9, it is evident that theR2 protein or polypeptide possess two unusual properties.

First, the R2 RT can jump between RNA templates. The cDNA strand fromthese jumps frequently contain the terminal nucleotides of the donor andacceptor RNA molecules indicating that these jumps do not involveannealing of the newly formed cDNA strands to the acceptor RNA template.Thus, the R2 protein can conduct continuous cDNA synthesis onnon-continuous RNA templates. In contrast, strand transfer by retroviralRTs requires sequence identity between the donor and acceptor RNAtemplates (Peliska and Benkovic, “Mechanism of DNA Strand TransferReactions Catalyzed by HIV-1 Reverse Transcriptase,” Science258:1112-1118 (1992); and DeStefano et al., “Requirements for StrandTransfer Between Internal Regions of Heteropolymer Templates by HumanImmunodeficiency Virus Reverse Transcriptase,” J. Virol. 66: 6370-6378(1992), each of which is hereby incorporated by reference in itsentirety). Retroviral transfers are accomplished through catalyticremoval of the donor RNA from the cDNA strand by an associated RNase Hdomain which allows the cDNA to anneal to the acceptor RNA molecule. TheR2 RT has no RNase H domain (Malik et al., “The Age and Evolution ofnon-LTR Retrotransposable Elements,” Mol. Biol. Evol. 16:793-805 (1999),which is hereby incorporated by reference in its entirety), and no suchactivity has been detected in vitro (Luan et al., “Reverse Transcriptionof R2Bm RNA is Primed by a Nick at the Chromosomal Target Site: AMechanism for non-LTR Retrotransposition,” Cell 72:595-605 (1993), whichis hereby incorporated by reference in its entirety). The onlysimilarity between the jumps by the R2 and retroviral RTs is that bothenzymes can add non-templated nucleotides to the cDNA at the end of thedonor RNA template (Peliska and Benkovic, “Mechanism of DNA StrandTransfer Reactions Catalyzed by HIV-1 Reverse Transcriptase,” Science258:1112-1118 (1992), which is hereby incorporated by reference in itsentirety). In the case of the retroviral mechanism, these extranucleotides leads to hypermutability in the template switch region ofthe genome. As will be described below, template jumping by the R2protein may explain the high sequence variation at the 5′ junction of R2elements.

A second unusual property of the R2 RT is that it can use the 3′ end ofa second RNA molecule to initiate reverse transcription. Such RNA-primedreactions have been previously characterized as an alternative to theTPRT, reaction when the R2 protein was bound to the DNA target site(Luan and Eickbush, “Downstream 28S Gene Sequences on the RNA TemplateAffect the Choice of Primer and the Accuracy of Initiation by the R2Reverse Transcriptase,” Mol. Cell. Biol. 16:4726-4734 (1996), which ishereby incorporated by reference in its entirety). Similar to the TPRTreaction itself, RNA-priming did not require the annealing of the primerRNA to the template RNA (Luani and Eickbush, “Downstream 28S GeneSequences on the RNA Template Affect the Choice of Primer and theAccuracy of Initiation by the R2 Reverse Transcriptase,” Mol. Cell.Biol. 16:4726-4734 (1996), which is hereby incorporated by reference inits entirety). It was shown above that when the R2 protein is not boundto the DNA target, any RNA can be reverse transcribed from its 3′ end bythis RNA-priming reaction. RNA-priming without the annealing of theprimer RNA to the template RNA has not been observed for retroviral RT.However, the RT encoded by the Mauriceville mitochondrial retroplasmidof Neurospora crassa has been shown capable of using the 3′ ends ofsingle-stranded DNA to prime reverse transcription in the absence ofsignificant sequence identity (Wang et al., “The Mauriceville Plasmid ofNeurospora crassa: Characterization of a Novel Reverse Transcriptasethat Begins cDNA Synthesis at the 3′ End of Template RNA,” Mol. Cell.Biol. 12:5131-5144 (1992); and Kennell et al., “The Mauriceville Plasmidof Neurospora spp. Uses Novel Mechanisms for Initiating ReverseTranscription in vitro,” Mol. Cell. Biol. 14:3094-3107 (1994), each ofwhich is hereby incorporated by reference in its entirety).

The different properties of the R2 and retroviral RTs is perhaps notunexpected because these enzymes differ substantially in size and arehighly divergent in sequences (Xiong and Eickbush, “Origin and Evolutionof Retroelements Based on their Reverse Transcriptase Sequences,” EMBOJ. 9:3351-3362 (1990), which is hereby incorporated by reference in itsentirety). Indeed, it is easier to align the amino acid sequence of theRT domains of R2 and other non-LTR retrotransposons with the comparabledomains of mitochondrial group II introns and retroplasmids, bacterialmsDNA and even telomerase than it is to the retroviral and LTRretrotransposon domains (Eickbush, “Origin and EvolutionaryRelationships of Retroelements,” In The Evolutionary Biology of Viruses(Morse, S. S. ed.), pp. 121-157, Raven Press, New York (1994); andNakamura et al., “Telomerase Catalytic Subunit Homologs from FissionYeast and Human,” Science 277:955-959 (1997), each of which is herebyincorporated by reference in its entirety). As a result the phylogeneticrelationship of these various retroelements can be established with someconfidence, while the relationship of these elements to the retrovirusesand LTR retrotransposons remains controversial (Nakamura and Cech,“Reversing Time: Origin of Telomerase,” Cell 92:587-600 (1998); andMalik and Eickbush, “Phylogenetic Analysis of Ribonuclease H DomainsSuggests a Late, Chimeric Origin of LTR Retrotransposable Elements andRetroviruses,” Genome Res. 11: 1187-1197 (2001), each of which is herebyincorporated by reference in its entirety).

The R2 RT also shares with the other retroelement RTs the ability tospecifically bind the RNA that will be used as template for cDNAsynthesis. Priming of reverse transcription by these different enzymesdoes not require extensive annealing of the template to aoligonucleotide primer. The printer is the 3′ end of a cleavedchromosomal site in the cases of the non-LTR retrotransposons and groupII introns (Luan et al., “Reverse Transcription of R2Bm RNA is Primed bya Nick at the Chromosomal Target Site: A Mechanism for non-LTRRetrotransposition,” Cell 72:595-605 (1993); and Cousineau et al.,“Retrohoming of a Bacterial Group II Intron: Mobility Via CompleteReverse Splicing, Independent of Homologous DNA Recombination,” Cell94:451-462 (1998), each of which is hereby incorporated by reference inits entirety), the 3′ end of the chromosome itself in the case oftelomerase (Nakamura et al., “Telomerase Catalytic Subunit Homologs fromFission Yeast and Human,” Science 277:955-959 (1997), which is herebyincorporated by reference in its entirety), the 2′ hydroxyl of aninternal G residue of the RNA template in the case of msDNA (Inouye andInouye, “Bacterial Reverse Transcriptase,” In Reverse Transcriptase(Goff, S. & Salka, A., eds). pp. 391-410, Cold Spring Harbor Press, ColdSpring Harbor, N.Y. (1993), which is hereby incorporated by reference inits entirety), and either de novo or the 3′ end of another cDNA in thecases of the Mauriceville plasmid (Kennell et al., “The MauricevillePlasmid of Neurospora spp. Uses Novel Mechanisms for Initiating ReverseTranscription in vitro,” Mol. Cell. Biol. 14:3094-3107 (1994); and Wangand Lambowitz, “The Mauriceville Plasmid Reverse Transcriptase Caninitiate cDNA Synthesis de novo and May Be Related to ReverseTranscriptase and DNA Polymerase Progenitor,” Cell 75:1071-1081 (1993),each of which is hereby incorporated by reference in its entirety). Onlyretroviruses and LTR retrotransposons use an annealed RNA to primereverse transcription of their RNA template (reviewed in Levin, “It'sPrime Time for Reverse Transcriptase,” Cell 88:5-8 (1997), which ishereby incorporated by reference in its entirety).

Another common feature of these various retroelement RTs is that theregion of the protein homologous to the ‘fingers’ and ‘palm’ domains areconsiderably larger than that of the retroviruses (Xiong and Eickbush,“Origin and Evolution of Retroelements Based on their ReverseTranscriptase Sequences,” EMBO J. 9:3351 -3362 (1990), which is herebyincorporated by reference in its entirety). All non-viral reversetranscriptases contain an extra segment that is not found in retroviralRTs (Eickbush, “Origin and Evolutionary Relationships of Retroelements,”In The Evolutionary Biology of Viruses (Morse, S. S. ed.), pp. 121-157,Raven Press, New York (1994), which is hereby incorporated by referencein its entirety), as well as additional segments in some groups. Forexample, non-LTR retrotransposons contain an additional segment betweensegment A and B, while group II intron RTs contain an extra regionbetween segments B and C (Eickbush, “Origin and EvolutionaryRelationships of Retroelements,” In The Evolutionary Biology of Viruses(Morse, S. S. ed.), pp. 121-157, Raven Press, New York (1994); andNakamura et al., “Telomerase Catalytic Subunit Homologs from FissionYeast and Human,” Science 277:955-959 (1997), each of which is herebyincorporated by reference in its entirety). Because the ‘fingers’ domainof retroviral RTs associates with the RNA template upstream of theactive site (Kohlstaedt et al., “Crystal Structure at 3.5 AngstromResolution of HIV-1 Reverse Transcriptase Complexed with an Inhibitor,”Science 256:1783-1790 (1992); and Sarafianos et al., “Crystal Structureof HIV-1 Reverse Transcriptase in Complex with a Polypurine TractRNA:DNA,” EMBO J. 20:1449-1461 (2001), each of which is herebyincorporated by reference in its entirety), these extra segments withinthe non-retroviral RTs are presumably involved in specific RNA-templateinteractions. Indeed, Chen and Lambowitz (“De novo and DNAPrimer-mediated Initiation of cDNA Synthesis by the MauricevilleRetroplasmid Reverse Transcriptase Involve Recognition of a 3′ CCASequence,” J. Mol. Biol. 271:311-332 (1997), which is herebyincorporated by reference in its entirety) have suggested that this‘finger’ domain is involved in specific recognition of the CCA sequenceinvolved in de novo initiation of reverse transcription by theMauriceville RT.

In FIG. 11 above, a simple model for the R2 RT is provided which canhelp explain its different properties compared to retroviral enzymes.This model has many similarities to that proposed for the Mauricevilleenzyme (Chen and Lambowitz, “De novo and DNA Primer-mediated Initiationof cDNA Synthesis by the Mauriceville Retroplasmid Reverse TranscriptaseInvolve Recognition of a 3° CCA Sequence,” J. Mol. Biol. 271:311-332(1997), which is hereby incorporated by reference in its entirety).Based on the additional amino acid motifs found in the palm and fingersregions of the R2 RT, and the demonstrated ability of the R2 enzyme tospecifically bind its own RNA template, the R2 enzyme is shown to havesignificant binding potential to the RNA template upstream of the activesite (FIG. 11A). There are two components to this template binding:specific affinity of the protein for the RNA structure assumed by the 3′UTR sequences of the R2 element, and the ability of the protein bind thefree 3′ end of any RNA molecule. The ability to bind RNA 3′ ends couldexplain how the R2 protein can template jump onto a second RNA templatewhen it completes transcription of the first RNA template (panel B). R2RNA templates are preferred in these jumps because these RNAs havehigher affinity for the RT. These properties of the R2 protein contrastwith HIV RT and its associated RNase H domain in which the majorinteractions of the protein is with the RNA template downstream of theactive site (Kohlstaedt et al., “Crystal Structure at 3.5 AngstromResolution of HIV-1 Reverse Transcriptase Complexed with an Inhibitor,”Science 256:1783-1790 (1992); and Sarafianos et al., “Crystal Structureof HIV-1 Reverse Transcriptase in Complex with a Polypurine TractRNA:DNA,” EMBO J. 20:1449-1461 (2001), each of which is herebyincorporated by reference in its entirety). Template switching byretroviral enzymes involves annealing of the acceptor RNA to the cDNAdownstream of the active site (Peliska and Benkovic, “Mechanism of DNAStrand Transfer Reactions Catalyzed by HIV-1 Reverse Transcriptase,”Science 258:1112-1118 (1992); and DeStefano et al., “Requirements forStrand Transfer Between Internal Regions of Heteropolymer Templates byHuman Immunodeficiency Virus Reverse Transcriptase,” J. Virol. 66:6370-6378 (1992), each of which is hereby incorporated by reference inits entirety).

The absence of the RNase H domain in the R2 RT means that RNA 3′ endsmay also be able to bind the opposite (downstream) side of the RT activesite (FIG. 11B). Thus, the ability of the R2 enzyme to use RNA to primereverse transcription in the absence of sequence identity can beexplained by the 3′ ends of two RNA molecules simultaneously bindingeither end of the presumed major groove that contains the active site ofthe enzyme. The R2 RT has significant preference to use the R2 RNA astemplate (upstream binding), but little specificity for the RNA thatprimes the reaction (downstream binding).

Priming of reverse transcription by the DNA cleavage site can be viewedas similar to that of RNA-priming (panel C). When the R2 protein isbound to the nicked DNA target site, the 3′ end of a cleaved DNA strandis positioned adjacent to the RT active site. When the R2 protein isfree in solution, the 3′ end of RNA can be bound to this site. In FIG.11, the DNA end has been drawn unpaired (to be used as primer), toemphasize its potential similarity to the RNA-priming reaction; however,there is no direct evidence for this suggestion.

It seems unlikely that template jumping between RNA templates plays arole in R2 retrotransposition. However, the ability of the enzyme toconduct such jumps can be viewed as support for one possible model ofhow the 5′ end of the reverse transcribed product is attached to theupstream DNA target site. Analysis of the sequence variation that existsat the 5′ end of R2 elements from a number of arthropod species has ledus to suggest a model in which the RT jumps from the R2 RNA templateonto the upstream DNA target (FIG. 11C) (Burke et al., “The DomainStructure and Retrotransposition Mechanism of R2 Elements are ConservedThroughout Arthropods,” Mol. Biol. Evol. 16:502-511 (1999); and Georgeet al., “Analysis of the 5′ Junctions of R2 Insertions with the 28Sgene: Implications for non-LTR Retrotransposition,” Genetics 142:853-863(1996), each of which is hereby incorporated by reference in itsentirety). R2 5′ junction variation includes apparent non-templatednucleotides similar to those resulting from in vitro template jumps(FIGS. 2, 6 and 9). R2 5′ junctions sometimes contain short deletions ofthe DNA upstream of the cleavage site. These deletions could beexplained by the jumps occurring to internal nucleotides near the free3′ end, again as seen above during in vitro jumps (FIGS. 2, 6 and 9).Finally, many R2 5′ junctions contain large deletions of the R2 elementindicating that no sequences at the 5′ end of the element are requiredfor 5′ attachment. These junctions are readily explained by templatejumps occurring prematurely or from RNA templates that are notfull-length.

Example 10 R2 RT is More Processive Than Retroviral AMV RT

The length with which a polymerase can synthesize a nucleic acid beforedissociating from its template is usually referred to as itsprocessivity. High processivity is desired of any RT to make full-lengthcDNA copies of RNA. The retroviral AMV polymerase is one with the bestcharacterized RTs and is known to be one of the most processive of theretroviral enzymes. FIG. 13 compares the processivity of the R2 RT withthat of AMV RT in a simple primer extension assay with the 600 nt vectorRNA as template. The reaction with AMV RT generates a larger percentageof cDNA products that are less than full length (AMV-RT, lane 1)compared to the R2 protein (R2-RT, lane 1). To confirm that the reactionproducts seen in FIG. 13 reflect the processivity of each enzyme (i.e.,the length of cDNA synthesized before the RT dissociates from the RNAtemplate) and not multiple rounds of elongation, the reactions were alsoconducted under conditions that do not allow the reinitiation of RTafter its dissociation. These single round reactions, also called RTtrap assays, involve the addition of heparin and an excess ofpoly(A)/oligoT. Heparin inhibits reinitiation while any reinitiationthat might occur will be predominantly onto the more abundantpoly(A)/oligo dT templates. To demonstrate the efficiency of the ‘trap’,lane 2 for each enzyme represents the addition of heparin and thepoly(A)/oligo dT at the same time as the addition of the RT, thuspreventing any synthesis of cDNA primed by the end-labeled DNA primerannealed to the 600 nt RNA.

In lane 3 of each panel, the RTs are first bound to the DNA primer/RNAtemplate complex and then heparin and poly(A)/oligo dT are added alongwith dNTPs to start reverse transcription. The length distribution ofthe cDNA synthesized by AMV RT in the presence of the trap was againsignificantly shorter than cDNA synthesized by the R2-RT. The differencein the accumulation of cDNA transcripts by the R2 and AMV RTs are alsoillustrated by the graph in FIG. 14A. The tracings in this Figurerepresent the RT product found in lane 3 of each enzyme in FIG. 13. Theyield of full-length cDNA product (600 nt) versus total cDNA (all bandsbetween 100-600 nt) was determined. The level of full-length productswith R2 was ˜16% of the total synthesis or nearly four times higher thanwith AMV-RT (4.1%).

Similar processivity comparisons between the R2 and AMV RTs were alsoconducted with the 1094 nt RNA derivative of the pBSKII(SK−) plasmid(FIG. 14B). Reverse transcription of the 1094 nt template by AMV-RTyields essentially no cDNA products longer than 450 nt. In contrast theR2 RT yields considerable cDNA products over 500 nt in length with adistinct full-length band at 1094 nt These observations clearly suggestthat under the conditions of these single round elongation reactions, R2RT is more processive than the AMV-RT. It should be mentioned that thecDNA distribution in the non-trap and trap reactions in FIG. 13 (lanes 1and 3 respectively) are similar for both enzymes. This result can beexplained by the very short reaction times and low polymeraseconcentrations used in these assays. Both conditions significantlydecrease the probability of reinitiation of the RT after it hasdissociated.

Finally, it should be noted that the cDNA that are less than full-lengthdiffered dramatically for the R2 and AMV RTs. In the case of the 600 nttemplate, the RT products generated by AMV reveals several distinctbands that are distributed along the RNA, while the distribution of RTproducts generated by the R2-RT are more diffuse with only one distinctband of ˜95 nt. In the case of the 1096 nt RNA, AMV produced distinctbands of lengths 130, 145, 160 and 300 nt, while the R2-RT producedweaker, but still distinct bands of lengths 100, 180, 200, 220 and 240nt. These differences in the length of the truncated cDNA reflect thedifferent abilities of the RTs to transcribe regions of RNA withdifferent primary and secondary structure. They also indicate that theshorter cDNA products produced by the RT are a result of enzymedissociation and not the result of a degraded RNA template.

Example 11 The Higher Processivity of R2 is a Result of its Reduced Rateof Dissociation from the RNA Template

The higher processivity of the R2 RT compared to retroviral RTs could bea result of two properties. First, it might dissociate from the RNAtemplate at a slower rate than retroviral enzymes. Second, it mightelongate cDNA at a faster rate than the retroviral enzymes. To comparethe dissociation rates of R2 and AMV RT, the 183 nt vector RNA withannealed end-labeled DNA primer was first preincubated with theappropriate RT to allow protein binding. To this complex were addedheparin and excess poly(A)/oligo dT (the trap). After various periods oftime, dNTPs are added to initiate reverse transcription, and theproducts are separated on polyacrylamide gels (FIG. 15A). The totalamount of cDNA synthesis at each time point was determined on aphosphoimager and plotted in FIG. 15B. This experiment demonstrated adramatic difference in the dissociation rate of the two RTs. The levelof cDNA products generated by the R2-RT decreased only 2 fold even aftera 45 minute incubation. Meanwhile the level of cDNA products generatedby the AMV-RT decreased 10-fold after only a 2 minute pre-incubationwith the trap.

The data was fit using decreasing single exponential functionexp(−k_(off)*x) and the dissociation rate was determined(k_(off)=0.33±0.04×10⁻³ sec⁻¹). Because it can be argued that absence ofsubstrate dNTP may significantly change the properties of theprotein/template complex and, thus, may affect its stability, anotherstability experiment was conduct similar to that in FIG. 15A, butcontaining 5 μM dATP in the preincubation mixture. This nucleotide isthe first to be incorporated from this RNA template and, thus, shouldrepresent complexes that have initiated reverse transcription. Thek_(off) determined under these conditions was not significantlydifferent (k_(off)=0.27×10⁻³ sec⁻¹). The approximate k_(off) determinedfor the AMV protein in the experiment in FIG. 15 (k_(off)=0.019±0.0021sec⁻¹) is in good agreement with published values for the half time ofAMV RT binding (˜30s), e.g., by DeStefano et al., J. Biol. Chem.266:7423-7431 (1991), which is hereby incorporated by reference in itsentirety.

Thus, the R2-RT dissociates from an RNA template nearly 60-fold slowerthan AMV-RT.

To determine the elongation rates of the R2 RT, the protein waspreincubated with the 1090 nt vector RNA/end-labeled DNA primer complexto allow association. The four dNTPs were then added for short periodsof time and the reaction abruptly stopped by the addition of SDS andethanol. The cDNA products from the R2 RT are shown in FIG. 16A. Themaximum length of cDNA synthesized at each time point can be used todetermine an elongation rate of 11.0 nt/sec. This reaction has beenconducted with a number of different templates. Plotted in FIG. 16B is asimilar experiment conducted with a 600 nt RNA template using both R2and AMV RT. In this experiment, the R2 RT elongation rate was calculatedto be 14.7 nt/sec similar to the rate calculated for the AMV-RT (12.9nt/sec).

The combined results of FIGS. 15 and 16 demonstrate that the increasedprocessivity of the R2 protein is a result of it higher stability on(i.e. reduced rate of dissociation from) the RNA template.

Example 12 The R2 RT is Not Blocked by the Secondary Structure of theRNA Template

The presence of truncated cDNA bands that are of specific-lengths (seeFIGS. 13-15), rather than a continuous range of cDNA lengths expectedfrom a gradual dissociation of the RT from the RNA template, is believedto be a result of the RT pausing at structural features of the RNAtemplate. As was suggested in the discussion to the data in FIG. 13 and14, it appears that the R2 RT responds differently and not as severelyto these structural features, because the yield and sizes of thesetruncated cDNAs differed for the two enzymes. To directly determine theeffects of RNA structure of the ability of the R2 protein to reversetranscribe RNA templates, RNA templates were engineered to containprecise hair-pin loops. It was determined that the R2 protein wasreadily able to transcribe through these loops. Unfortunately, DNAsequences with extremely long such loops that can be transcribed by T7RNA polymerase are difficult to clone in on bacterial plasmids.Therefore, a somewhat different approach was utilized to generate suchstable hairpins by simply annealing two complimentary RNA molecules. Theexperimental approach (diagramed in FIG. 17B) involved the 334 nt RNAtemplate to which is annealed both a 19 nt oligodeoxynucleotide near themiddle of the RNA (the primer), and a 117 nt RNA that is a perfectcomplement to the 5′ end of the PNA (the block). cDNA synthesis ismonitored on gels by means of the end-labeled DNA primer. Both the R2and AMV enzymes were tested with and without the RNA block (lanes 1 and2, respectively) under two conditions: the presence of the poly(A)/oligodT trap (left panel) and the absence of a trap to allow multiplere-initiations (right panel).

With the R2 RT, only full-length cDNA products are generatedirrespective of whether the RNA template contains the RNA block (lane 1)or in the absence of the RNA block (lane 2). In the case of the AMV-RTthe presence of the RNA block completely prevents any cDNA synthesismore than a few nucleotides past the beginning of the block. In theabsence of the RNA block, full length products are obtained with theAMV-RT, but even here much of the cDNA stops near the middle of the RNA,presumably as a result of a the RNA secondary structure. These resultsdramatically reveal that the R2 RT is not significantly blocked byduplex regions of RNA templates. In another set of experiment, the R2protein was determined to be capable of readily reverse transcribingpoly(A) templates that are saturated by oligo(dT). Under these sameconditions where AMV-RT is severely inhibited.

At this point it is not clear whether the ability of the R2 RT toreverse transcribe through the annealed RNA is a result of the protein'sability to actively displace the annealed RNA strand from the template,or whether the remarkable stability of the R2 protein on its RNAtemplate allows the enzyme to passively move through the duplex regionduring the random opening and closing (sometimes called breathing)associated with the ends of duplex nucleic acids.

Example 13 Effects of Temperature on the Reverse Transcription Reactions

Because weak RNA secondary structures are known to be highly dependentupon the temperature of the solution, increasing the temperature of thereaction is sometimes used as a means to minimize the effects of RNAstructure on a RT reaction. In FIG. 18, the ability of the R2 and AMVRTs to reverse transcribe the 600 nt RNA template at temperaturesranging from 25° C. to 55° C. are compared. The reactions were againconducted for short periods with low concentration of RT to promote theformation of products derived from only a single round of enzymeassociation. The total cDNA produced at each temperature was determinedon a phosphoimager, and the fraction of the cDNA corresponding tofull-length cDNA (˜600 nt) at each temperature is plotted in FIG. 19.

In the case of the AMV-RT high levels of cDNA synthesis was obtained atall temperatures, but the percentage of the cDNA product correspondingto full-length transcripts never exceeded 1.2%. The nature of thetruncated products did not substantially differ at the differenttemperatures suggesting that the changes in the secondary structure ofthe RNA associated with these different temperatures had minimal effectof enzyme dissociation.

In the case of the R2 RT, the total amount of cDNA synthesis increasedfrom 25° C. to 40° C. at which point further temperature increasescaused a dramatic loss of activity. This loss of activity is presumablyas a result of the denaturation of the protein. Surprisingly, eventhough the R2 protein is less active at lower temperatures, the fractionof the total cDNA that was full-length increased, such that at 25° C.over 20% of that cDNA is full-length. These results could be explainedif rising temperatures increase the dissociation rate of the R2 proteinfrom the RNA template to a greater extend than they increase theelongation rate.

Although the invention has been described in detail for the purpose ofillustration, it is understood that such detail is solely for thatpurpose, and variations can be made therein by those skilled in the artwithout departing from the spirit and scope of the invention which isdefined by the following claims.

1. A method of preparing a cDNA molecule comprising: contacting an RNAmolecule, in the presence of dNTPs, with a non-LTR retrotransposonprotein or polypeptide having reverse transcriptase activity, whereinthe non-LTR retrotransposon protein or polypeptide is an R2 protein orpolypeptide, under conditions effective for production of a cDNAmolecule complementary to the RNA molecule, said contacting beingcarried out in the absence of a target DNA molecule of the non-LTRretrotransposon protein or polypeptide; and isolating the cDNA molecule.2. (canceled)
 3. The method according to claim 1, wherein the R2 proteinor polypeptide is derived from an arthropod.
 4. The method according toclaim 3, wherein the arthropod is Bombyx mori.
 5. The method accordingto claim 1, wherein the RNA molecule lacks a primer site to initiatereverse transcription.
 6. The method according to claim 1, wherein theRNA molecule lacks a polyadenylation region.
 7. The method according toclaim 1, wherein said contacting is carried out in the presence of botha donor RNA molecule having a known sequence and an acceptor RNAmolecule having a known sequence.
 8. The method according to claim 1,wherein said contacting is carried out in the presence of a donor RNAmolecule having a known sequence.
 9. The method according to claim 1,wherein said contacting is carried out in the presence of an acceptorRNA molecule having a known sequence.
 10. The method according to claim1, wherein said contacting is carried out under isothermic conditions.11. The method according to claim 1, wherein said contacting is carriedout at a temperature of between about 20° C. and about 4⁰0C.
 12. Themethod according to claim 11, wherein said contacting is carried out ata temperature of between about 21° C. and about 35° C.
 13. The methodaccording to claim 1, wherein the RNA molecule includes a structure oran annealed duplex region that would interfere with retroviral reversetranscriptase function.
 14. The method according to claim 1, whereinsaid contacting is carried out under conditions whereby a significantportion of the isolated cDNA molecules are substantially full lengthreverse transcripts of the RNA molecule.
 15. The method according toclaim 1, wherein the RNA molecule includes a polyadenylated region, themethod further comprising: annealing a primer to the polyadenylatedregion of the RNA molecule prior to said contacting.
 16. The methodaccording to claim 15, wherein said contacting is carried out in thepresence of an acceptor RNA molecule having a known nucleotide sequence.17-26. (canceled)
 27. A method of amplifying a cDNA molecule comprising:performing the method of claim 1 to obtain a single-stranded cDNAmolecule that includes a region of interest; annealing a first primer tothe single-stranded cDNA molecule at a position 3′ of the region ofinterest; and extending the first primer to form a complementary DNAstrand including a complement of the region of interest.
 28. The methodaccording to claim 27 further comprising: dissociating the complementaryDNA strand from the single-stranded cDNA molecule; annealing a secondprimer to the complementary DNA strand molecule at a position 3′ of thecomplement of the region of interest; and extending the second primer toform a second complementary DNA strand which is substantially the sameas the single-stranded cDNA molecule at the region of interest.
 29. Themethod according to claim 28 further comprising: dissociating the secondcomplementary DNA strand from the complementary DNA strand; andrepeating said annealing and extending of the first and second primers,using the second complementary DNA strand, to form third and fourthcomplementary DNA strands, the third complementary DNA strand beingsubstantially the same as the first complementary strand and the fourthcomplementary DNA strand being substantially the same as the secondcomplementary strand.
 30. The method according to claim 27, wherein saidperforming is carried out under conditions effective for the non-LTRretrotransposon protein or polypeptide to jump from the RNA molecule toan acceptor RNA molecule having a known sequence, the single-strandedcDNA molecule comprising a first portion complementary to the RNAmolecule and a second portion complementary to the acceptor RNAmolecule, the second portion being located 3′ of the first portion. 31.The method according to claim 30, wherein the primer anneals to thesecond portion of the single-stranded cDNA molecule.
 32. The methodaccording to claim 27, wherein said performing is carried out underconditions effective for the non-LTR retrotransposon protein orpolypeptide to jump from a donor RNA molecule having a known sequence tothe RNA molecule, the single-stranded cDNA molecule comprising a firstportion complementary to the donor RNA molecule and a second portioncomplementary to the RNA molecule, the second portion being located 3′of the first portion.
 33. The method according to claim 27 furthercomprising: exposing the single-stranded cDNA molecule to a terminaltransferase in the presence of dCTPs to form an oligoC tail at the 3′end of the single-stranded cDNA molecule.
 34. The method according toclaim 33, wherein said exposing is carried out prior to said annealingthe first primer and the first primer anneals to the oligoC tail. 35.The method according to claim 27, wherein said performing is carried outunder substantially isothermic conditions.
 36. The method according toclaim 27, wherein said performing is carried out at a temperature ofbetween about 20° C. and about 40° C.
 37. The method according to claim27, wherein said performing is carried out at a temperature of betweenabout 21° C. and about 35° C.
 38. A method of amplifying a cDNA moleculecomprising: performing the method according to claim 7 to obtain asingle-stranded cDNA molecule that includes a region of interest, aregion complementary of the donor RNA 5′ of the region of interest, anda region complementary of the acceptor RNA 3′ of the region of interest;annealing a first primer to the single-stranded cDNA molecule at aposition 3′ of the region of interest; and extending the first primer toform a complementary DNA strand including a complement of the region ofinterest.
 39. The method according to claim 38 further comprising:dissociating the complementary DNA strand from the single-stranded cDNAmolecule; annealing a second primer to the complementary DNA strandmolecule at a position 3′ of the complement of the region of interest;and extending the second primer to form a second complementary DNAstrand which is substantially the same as the single-stranded cDNAmolecule at the region of interest.
 40. The method according to claim 39further comprising: dissociating the second complementary DNA strandfrom the complementary DNA strand; and repeating said annealing andextending of the first and second primers, using the secondcomplementary DNA strand, to form third and fourth complementary DNAstrands, the third complementary DNA strand being substantially the sameas the first complementary strand and the fourth complementary DNAstrand being substantially the same as the second complementary strand.41. A method of amplifying a cDNA molecule comprising: performing themethod according to claim 16 to obtain a single-stranded cDNA moleculethat includes a region of interest, an oligoT region 5′ of the region ofinterest, and a region complementary of the acceptor RNA 3′ of theregion of interest; annealing a first primer to the single-stranded cDNAmolecule at a position 3′ of the region of interest; and extending thefirst primer to form a complementary DNA strand including a complementof the region of interest.
 42. The method according to claim 41 furthercomprising: dissociating the complementary DNA strand from thesingle-stranded cDNA molecule; annealing a second primer to thecomplementary DNA strand molecule at a position 3′ of the complement ofthe region of interest; and extending the second primer to form a secondcomplementary DNA strand which is substantially the same as thesingle-stranded cDNA molecule at the region of interest.
 43. The methodaccording to claim 42 further comprising: dissociating the secondcomplementary DNA strand from the complementary DNA strand; andrepeating said annealing and extending of the first and second primers,using the second complementary DNA strand, to form third and fourthcomplementary DNA strands, the third complementary DNA strand beingsubstantially the same as the first complementary strand and the fourthcomplementary DNA strand being substantially the same as the secondcomplementary strand. 44-49. (canceled)